Question 1

What models do you use?

Accepted Answer

Deasy uses a mixture of LLM & ML models as part of the metadata workflow, including their own hosted open-source model. Options to connect to custom models and a user’s own end-points are available.

Question 2

Can we use Deasy via API?

Accepted Answer

Customers can access all of Deasy’s tagging functionality through API or via a user interface (UI). Many customers use Deasy’s UI to define, test and view tags, and then subsequently run the extraction programmatically via API.

Question 3

What do you mean by ‘metadata’?

Accepted Answer

In the context of Deasy, ‘metadata’ means tags that are derived from the content of a document or image. These can range from simple entities, to descriptive, categorical or thematic labels. Deasy’s metadata tagging uses LLMs, and is therefore fully context-aware in nature.

Question 4

How does Deasy validate the accuracy of the metadata?

Accepted Answer

Deasy provides several features for ensuring tagging accuracy:

Human in the loop testing: A testing studio to refine and fine-tune tags.
Evidence: We return both the value and evidence for every tag (including highlights from the original documents to clearly show why the tag took a certain value).
Automated standardization: Standardization of tags to a constrained set of values to enhance metadata integrity.
Model selection: Deasy’s choice of models reflects continuous experimentation and testing of classification performance from different model providers.

Question 5

What type of data do you support?

Accepted Answer

Deasy supports unstructured data of many forms (e.g., .txt, PDF, .docx, .md) and pure image formats. Deasy can perform an OCR step to extract images & tables upon document ingestion, which can also be tagged during metadata extraction.

Question 6

Can I bring my existing data dictionary into Deasy?

Accepted Answer

Deasy can ingest an existing data dictionary / taxonomy (e.g., in .CSV format) and auto-create the equivalent tags within its platform (which can then be extracted, modified or tested).

Question 7

Do you chunk the underlying data before tagging?

Accepted Answer

Deasy can connect to either a vectorDB where data has already been chunked, or to a file-storage system of raw data (e.g., Sharepoint). In the latter case, Deasy will use an in-built OCR and chunking step to first prepare the data ahead of tagging.

In both cases, Deasy will generate chunk-level metadata tags first, and then synthesize these to file-level metadata.

Question 8

What data connectors do you support?

Accepted Answer

Cloud file storage: S3, Azure Blob, Google Cloud Storage VectorDB: Quadrant, PostgreSQL

Custom connectors can be built following a PoC phase of an engagement.

Question 9

How does your auto-suggested metadata functionality work?

Accepted Answer

Deasy’s tagging studio allows easy fine-tuning of any classification task.

This process allows users to generate few-shot examples that capture information that an underlying LLM will not yet have seen, which can be appended to the tag definition. Deasy’s testing studio ensures that this process requires minimal time, at no additional cost.

Question 10

Where is the data stored?

Accepted Answer

Deasy connects to a customer’s file storage system and generates metadata. This metadata is stored in Deasy’s PostgreSQL database. These tags can then be:

Exported as a CSV or JSON
Synced directly back to a customer’s vector database
Pushed back into a customer’s central file storage system (limited connector availability at present).

Question 11

Does data leave our systems?

Accepted Answer

Deasy has a hosted version of their platform on GCP, or can deploy within the private cloud of the customer. Various LLM choices are also available, including use of an open source model that can be hosted within the customer’s environment.

This means there is optionality for customers to have everything fully self-contained within their environment.

Question 12

How does your pricing work?

Accepted Answer

Deasy’s pricing is based on the volume of metadata created and stored in the platform, using a tiered subscription model.

A given tier of Deasy’s commercial model is defined by:

A maximum number of tag extractions that can be performed per month
A total volume of tags that can be stored in the platform

Frequently Asked Questions

Common questions — answered

Book a demo