Skip to main content

Documentation Index

Fetch the complete documentation index at: https://explore.airia.com/llms.txt

Use this file to discover all available pages before exploring further.

Vocabulary Grounding lets you teach Airia the language of your business — product codes, internal acronyms, project codenames, role titles, regional jargon — so retrieval interprets queries the way your team does. You add a list of terms with their definitions; at query time, Airia matches the user’s question against your vocabulary and passes any matched term/definition pairs to the language model as additional context. The retrieved documents themselves are unchanged. What changes is the model’s understanding of what the query and chunks actually mean in your tenant.

Why use it

  • Better answers on jargon. Internal acronyms and project codenames have no public meaning for embeddings to latch onto. A definition gives the model the missing context.
  • Disambiguation. When a term has multiple meanings (for example, “Lighthouse” the Google tool vs. “Lighthouse” your CRM), your definition tells Airia which one this tenant means.
  • Consistent answers across teams. Everyone in the tenant gets the same authoritative meaning, regardless of how the question is phrased.
  • No re-ingestion required. Vocabulary changes take effect on the next query — you don’t have to reindex any data.

When to use it

Add an entry whenever:
  • Your domain uses an acronym, code, product or project name, or role title that isn’t broadly known on the public internet.
  • A common English word means something specific in your company (for example, “Atlas” is your data warehouse, not the cloud product).
  • Users on your team commonly misspell or alias a term and you want all variants to resolve to the same canonical concept (see the synonyms workaround below).
  • You’re seeing retrieval misses or wrong-context answers that trace back to a specific term being misunderstood.
Don’t add an entry for terms whose meaning is already clear from the public web or from the documents themselves — over-defining dilutes the signal.

Where it applies

Search typeVocabulary Grounding applied?
SemanticYes
HybridYes
Graph RAGYes
SQLNo
Vocabulary Grounding is applied to semantic and hybrid search, and continues to apply to Graph RAG. It is not applied to SQL search — SQL queries run against structured tables and match column values directly, so the vocabulary layer doesn’t enter the path.

How it improves retrieval

  1. The user submits a query.
  2. Airia retrieves relevant chunks from your indexed content and, in the same step, looks up any vocabulary terms that appear in the query. The matched term/definition pairs come back together with the chunks — not as a separate later step.
  3. The retrieved chunks and the matched vocabulary are then passed to the LLM together. The model uses your definitions to resolve domain terms in both the question and the retrieved text.
Your indexed documents are never modified — the vocabulary reaches the model as added context, not by rewriting your data. The result is fewer wrong-context answers, fewer “I don’t know” responses for known internal terms, and more consistent phrasing in the model’s output.

Adding terms

Vocabulary Grounding currently supports two fields per entry:
  • Term — the canonical word or phrase. Required. Max 200 characters.
  • Definition — a short, human-readable explanation. Required. Max 2,000 characters.
A dedicated synonyms column is on the roadmap but is not yet available.
Synonyms workaround: include alternate spellings, abbreviations, and aliases directly inside the Definition field. The full definition text is provided to the model, so aliases placed there do help the LLM resolve variants to the canonical term. Example: Term: Lighthouse Definition: Internal customer-relationship platform built on Salesforce. Also referred to as LH, the customer hub, or “the CRM” in tickets and Slack.

Single entry

Open Integrations → Vocabulary Grounding → Add term and fill in the two fields.

Bulk upload

Use the bulk-upload wizard to import many terms at once.
  • Formats: CSV or JSON.
  • Max file size: 10 MB.
  • CSV format: header row term,definition, one entry per line.
  • JSON format: an array of objects, for example [{"term": "Lighthouse", "definition": "..."}].
Terms must be unique within the tenant. Duplicates are rejected with a clear error and can be edited individually.

Tips for good entries

  • Keep definitions short. One or two sentences usually outperforms a paragraph — extra words dilute the signal.
  • Write for an outsider. Define terms the way someone joining the company tomorrow would need to hear them.
  • Put aliases in the definition. Until the synonyms column ships, this is how variants get resolved (see workaround above).
  • Avoid redefining common words unless the domain meaning genuinely differs — over-grounding adds noise to every query.
  • Review periodically. If a project is renamed or retired, update or remove the entry so the model isn’t grounded on stale meaning.

Scope and limits

SettingValue
ScopeTenant-wide
Term lengthUp to 200 characters
Definition lengthUp to 2,000 characters
Bulk upload sizeUp to 10 MB
Bulk upload formatsCSV, JSON
Term uniquenessRequired within the tenant
Vocabulary entries are shared across all data sources and agents in the tenant — there is no per-data-source vocabulary today.

Permissions

Managing vocabulary entries (create, edit, bulk upload, delete) is gated by role permission. Users without that permission still benefit from grounding at retrieval time; they just can’t modify the term list.