Vocabulary Grounding lets you teach Airia the language of your business — product codes, internal acronyms, project codenames, role titles, regional jargon — so retrieval interprets queries the way your team does. You add a list of terms with their definitions; at query time, Airia matches the user’s question against your vocabulary and passes any matched term/definition pairs to the language model as additional context. The retrieved documents themselves are unchanged. What changes is the model’s understanding of what the query and chunks actually mean in your tenant.Documentation Index
Fetch the complete documentation index at: https://explore.airia.com/llms.txt
Use this file to discover all available pages before exploring further.
Why use it
- Better answers on jargon. Internal acronyms and project codenames have no public meaning for embeddings to latch onto. A definition gives the model the missing context.
- Disambiguation. When a term has multiple meanings (for example, “Lighthouse” the Google tool vs. “Lighthouse” your CRM), your definition tells Airia which one this tenant means.
- Consistent answers across teams. Everyone in the tenant gets the same authoritative meaning, regardless of how the question is phrased.
- No re-ingestion required. Vocabulary changes take effect on the next query — you don’t have to reindex any data.
When to use it
Add an entry whenever:- Your domain uses an acronym, code, product or project name, or role title that isn’t broadly known on the public internet.
- A common English word means something specific in your company (for example, “Atlas” is your data warehouse, not the cloud product).
- Users on your team commonly misspell or alias a term and you want all variants to resolve to the same canonical concept (see the synonyms workaround below).
- You’re seeing retrieval misses or wrong-context answers that trace back to a specific term being misunderstood.
Where it applies
| Search type | Vocabulary Grounding applied? |
|---|---|
| Semantic | Yes |
| Hybrid | Yes |
| Graph RAG | Yes |
| SQL | No |
How it improves retrieval
- The user submits a query.
- Airia retrieves relevant chunks from your indexed content and, in the same step, looks up any vocabulary terms that appear in the query. The matched term/definition pairs come back together with the chunks — not as a separate later step.
- The retrieved chunks and the matched vocabulary are then passed to the LLM together. The model uses your definitions to resolve domain terms in both the question and the retrieved text.
Adding terms
Vocabulary Grounding currently supports two fields per entry:- Term — the canonical word or phrase. Required. Max 200 characters.
- Definition — a short, human-readable explanation. Required. Max 2,000 characters.
Synonyms workaround: include alternate spellings, abbreviations, and aliases directly inside the Definition field. The full definition text is provided to the model, so aliases placed there do help the LLM resolve variants to the canonical term. Example: Term: Lighthouse Definition: Internal customer-relationship platform built on Salesforce. Also referred to as LH, the customer hub, or “the CRM” in tickets and Slack.
Single entry
Open Integrations → Vocabulary Grounding → Add term and fill in the two fields.Bulk upload
Use the bulk-upload wizard to import many terms at once.- Formats: CSV or JSON.
- Max file size: 10 MB.
- CSV format: header row
term,definition, one entry per line. - JSON format: an array of objects, for example
[{"term": "Lighthouse", "definition": "..."}].
Tips for good entries
- Keep definitions short. One or two sentences usually outperforms a paragraph — extra words dilute the signal.
- Write for an outsider. Define terms the way someone joining the company tomorrow would need to hear them.
- Put aliases in the definition. Until the synonyms column ships, this is how variants get resolved (see workaround above).
- Avoid redefining common words unless the domain meaning genuinely differs — over-grounding adds noise to every query.
- Review periodically. If a project is renamed or retired, update or remove the entry so the model isn’t grounded on stale meaning.
Scope and limits
| Setting | Value |
|---|---|
| Scope | Tenant-wide |
| Term length | Up to 200 characters |
| Definition length | Up to 2,000 characters |
| Bulk upload size | Up to 10 MB |
| Bulk upload formats | CSV, JSON |
| Term uniqueness | Required within the tenant |
