Deasy uses a mixture of LLM & ML models as part of the metadata workflow, including their own hosted open-source model. Options to connect to custom models and a user’s own end-points are available.
Frequently Asked Questions
Common questions — answered
- What models do you use?
-
- Can we use Deasy via API?
-
Customers can access all of Deasy’s tagging functionality through API or via a user interface (UI). Many customers use Deasy’s UI to define, test and view tags, and then subsequently run the extraction programmatically via API.
- What do you mean by ‘metadata’?
-
In the context of Deasy, ‘metadata’ means tags that are derived from the content of a document or image. These can range from simple entities, to descriptive, categorical or thematic labels. Deasy’s metadata tagging uses LLMs, and is therefore fully context-aware in nature.
- How does Deasy validate the accuracy of the metadata?
-
Deasy provides several features for ensuring tagging accuracy:
- Human in the loop testing: A testing studio to refine and fine-tune tags.
- Evidence: We return both the value and evidence for every tag (including highlights from the original documents to clearly show why the tag took a certain value).
- Automated standardization: Standardization of tags to a constrained set of values to enhance metadata integrity.
- Model selection: Deasy’s choice of models reflects continuous experimentation and testing of classification performance from different model providers.
- What type of data do you support?
-
Deasy supports unstructured data of many forms (e.g., .txt, PDF, .docx, .md) and pure image formats. Deasy can perform an OCR step to extract images & tables upon document ingestion, which can also be tagged during metadata extraction.
- Can I bring my existing data dictionary into Deasy?
-
Deasy can ingest an existing data dictionary / taxonomy (e.g., in .CSV format) and auto-create the equivalent tags within its platform (which can then be extracted, modified or tested).
- Do you chunk the underlying data before tagging?
-
Deasy can connect to either a vectorDB where data has already been chunked, or to a file-storage system of raw data (e.g., Sharepoint). In the latter case, Deasy will use an in-built OCR and chunking step to first prepare the data ahead of tagging.
In both cases, Deasy will generate chunk-level metadata tags first, and then synthesize these to file-level metadata.
- What data connectors do you support?
-
Cloud file storage: S3, Azure Blob, Google Cloud Storage VectorDB: Quadrant, PostgreSQL
Custom connectors can be built following a PoC phase of an engagement.
- How does your auto-suggested metadata functionality work?
-
Deasy’s tagging studio allows easy fine-tuning of any classification task.
This process allows users to generate few-shot examples that capture information that an underlying LLM will not yet have seen, which can be appended to the tag definition. Deasy’s testing studio ensures that this process requires minimal time, at no additional cost.
- Where is the data stored?
-
Deasy connects to a customer’s file storage system and generates metadata. This metadata is stored in Deasy’s PostgreSQL database. These tags can then be:
- Exported as a CSV or JSON
- Synced directly back to a customer’s vector database
- Pushed back into a customer’s central file storage system (limited connector availability at present).
- Does data leave our systems?
-
Deasy has a hosted version of their platform on GCP, or can deploy within the private cloud of the customer. Various LLM choices are also available, including use of an open source model that can be hosted within the customer’s environment.
This means there is optionality for customers to have everything fully self-contained within their environment.
- How does your pricing work?
-
Deasy’s pricing is based on the volume of metadata created and stored in the platform, using a tiered subscription model.
A given tier of Deasy’s commercial model is defined by:
- A maximum number of tag extractions that can be performed per month
- A total volume of tags that can be stored in the platform