Airia supports multiple retrieval patterns for bringing knowledge into your agents. Choose based on query complexity, cost, and accuracy requirements. For detailed configuration of search parameters (hybrid weighting, fusion algorithms, reranking), see Hybrid Search and Reranking. For how knowledge graphs enhance retrieval, see Graph-Enhanced Retrieval.Documentation Index
Fetch the complete documentation index at: https://explore.airia.com/llms.txt
Use this file to discover all available pages before exploring further.
This page covers the same content as Add a Data Source in the Agent Basics section, reframed for the Context Engineering pipeline.
Two Core Retrieval Methods
1. Data Search Step
A dedicated pipeline step that performs a single, embedding-based retrieval pass. The full user input is used as the search query, and the retrieved chunks are passed directly to the LLM or the next step. Best for: Simple queries, linear workflows, batch processing. Faster execution and lower cost.2. MCP Multi-Hop Retrieval
Sources are attached directly to the LLM, which dynamically decides which sources to query, which retrieval tools to use, and how many times to search. Powered by multi-hop retrieval via the Airia Datasource MCP Server. Best for: Complex queries, conversational agents, accuracy-critical applications.| Aspect | Data Search Step | MCP Multi-Hop Retrieval |
|---|---|---|
| Retrieval | Single-hop | Multi-hop |
| Search calls | Always one | One or more (LLM-determined) |
| Speed | Faster | Slower |
| Cost | Lower | Higher |
| Best use | Simple, predictable queries | Complex, conversational, reasoning-heavy |
Configuring a MCP Multi-Hop Retrieval
Step 1 — Enable MCP Multi-Hop Retrieval
- Open your agent pipeline in the Airia builder
- Navigate to the AI Step you want to configure
- Toggle on Enable Datasource
Step 2 — Select Your Datasource(s)
- Click the datasource dropdown that appears
- Select one or more datasources — multi-selection is supported
- The description and ID of each selected datasource are automatically passed to the LLM context at search time
💡 Tip: Make sure your datasource descriptions are clear and specific. The AI uses these descriptions to determine which source is most relevant for a given query.
Step 3 — Review Retrieval Tools
Once a datasource is selected, the Airia Datasource MCP Server is automatically deployed and attached to your AI step.- By default, all available retrieval tools are enabled
- You can manually disable individual tools based on your use case (e.g., if you only want vector search and not keyword search)
⚠️ Important: If neither the Airia Datasource MCP Server nor any Airia native retrieval tools are configured, the LLM will not have access to your knowledge base and may produce incorrect or hallucinated answers.
What If No Datasource Is Selected?
If a datasource is not selected in the AI step, the LLM will still require a datasource ID to search against. You must provide it in one of these ways:- In the LLM prompt (system or user prompt)
- In the user input passed to the AI step at runtime
⚠️ Warning: If no datasource ID is supplied through any of these methods and no retrieval tool is configured, the AI has no knowledge source to query. This will likely result in hallucinated or factually incorrect responses.
Configuring a Data Search Step
Semantic Search Settings
- Max Results — Maximum number of text chunks returned based on semantic similarity. Default: 5. Range: 1-10,000.
- Relevance Threshold (1-100) — Filters out chunks below a minimum similarity score. Default: 70. Maps to cosine similarity.
- Neighboring Chunks — Includes surrounding context from matched chunks. Default: 1 (one chunk before and after). Range: 0-10.
- Hybrid Search — Combines semantic and keyword search with adjustable weighting:
- 100% Keyword / 0% Semantic: Only exact word matches
- 50% Keyword / 50% Semantic: Equal importance to meaning and exact words
- 0% Keyword / 100% Semantic: Only meaning-based matching
Text-to-SQL Search
For structured data (.csv and .xlsx files), you can enable Text-to-SQL search:
- Translates natural language queries into SQL
- Supports fuzzy search capability (increases query complexity)
- Recommended models: Claude 4 Sonnet, GPT 4.1, Claude 3.7 Sonnet
Important: For both Semantic and Text-to-SQL search to function, indexes must be created and the data source configured during its creation.
Available Retrieval Tools (MCP)
When using the MCP Multi-Hop Retrieval, the Airia Datasource MCP Server exposes:| Tool | What it does |
|---|---|
| Datastore Semantic and Keyword Search | Searches a single data source using vector and/or keyword matching |
| Multi Data Store Semantic and Keyword Search | Searches across multiple data sources simultaneously |
| Datastore Filename Search | Finds files by name within a data source |
| File Content Retrieval | Retrieves full content of a specific file |
| Graph Database Cypher Query | Queries a knowledge graph (when Graph RAG is enabled) |
| Multi Data Store SQL Query | Runs natural language-to-SQL queries on structured data |
Limitations
⚠️ Known Limitation: Tool calls — including datasource retrieval tools — do not currently work within nested agent (agent-in-agent) configurations. This is a platform-wide limitation affecting all MCPs, not specific to the Datasource MCP Server.
Best Practices
- Write descriptive datasource names and descriptions for intelligent LLM source selection
- Retain all retrieval tools unless you’re confident they won’t be needed
- Ensure retrieval configuration exists in all knowledge-base AI steps
- Use the Data Search Step for straightforward workflows where cost and speed matter
- Use the MCP Multi-Hop Retrieval for complex, multi-source, or conversational use cases
- Test multi-hop retrieval extensively before deployment
