Skip to main content

Documentation Index

Fetch the complete documentation index at: https://explore.airia.com/llms.txt

Use this file to discover all available pages before exploring further.

Airia supports multiple retrieval patterns for bringing knowledge into your agents. Choose based on query complexity, cost, and accuracy requirements. For detailed configuration of search parameters (hybrid weighting, fusion algorithms, reranking), see Hybrid Search and Reranking. For how knowledge graphs enhance retrieval, see Graph-Enhanced Retrieval.
This page covers the same content as Add a Data Source in the Agent Basics section, reframed for the Context Engineering pipeline.

Two Core Retrieval Methods

1. Data Search Step

A dedicated pipeline step that performs a single, embedding-based retrieval pass. The full user input is used as the search query, and the retrieved chunks are passed directly to the LLM or the next step. Best for: Simple queries, linear workflows, batch processing. Faster execution and lower cost.

2. MCP Multi-Hop Retrieval

Sources are attached directly to the LLM, which dynamically decides which sources to query, which retrieval tools to use, and how many times to search. Powered by multi-hop retrieval via the Airia Datasource MCP Server. Best for: Complex queries, conversational agents, accuracy-critical applications.
AspectData Search StepMCP Multi-Hop Retrieval
RetrievalSingle-hopMulti-hop
Search callsAlways oneOne or more (LLM-determined)
SpeedFasterSlower
CostLowerHigher
Best useSimple, predictable queriesComplex, conversational, reasoning-heavy

Configuring a MCP Multi-Hop Retrieval

Step 1 — Enable MCP Multi-Hop Retrieval

  1. Open your agent pipeline in the Airia builder
  2. Navigate to the AI Step you want to configure
  3. Toggle on Enable Datasource

Step 2 — Select Your Datasource(s)

  1. Click the datasource dropdown that appears
  2. Select one or more datasources — multi-selection is supported
  3. The description and ID of each selected datasource are automatically passed to the LLM context at search time
💡 Tip: Make sure your datasource descriptions are clear and specific. The AI uses these descriptions to determine which source is most relevant for a given query.

Step 3 — Review Retrieval Tools

Once a datasource is selected, the Airia Datasource MCP Server is automatically deployed and attached to your AI step.
  • By default, all available retrieval tools are enabled
  • You can manually disable individual tools based on your use case (e.g., if you only want vector search and not keyword search)
⚠️ Important: If neither the Airia Datasource MCP Server nor any Airia native retrieval tools are configured, the LLM will not have access to your knowledge base and may produce incorrect or hallucinated answers.

What If No Datasource Is Selected?

If a datasource is not selected in the AI step, the LLM will still require a datasource ID to search against. You must provide it in one of these ways:
  • In the LLM prompt (system or user prompt)
  • In the user input passed to the AI step at runtime
⚠️ Warning: If no datasource ID is supplied through any of these methods and no retrieval tool is configured, the AI has no knowledge source to query. This will likely result in hallucinated or factually incorrect responses.

Configuring a Data Search Step

Semantic Search Settings

  • Max Results — Maximum number of text chunks returned based on semantic similarity. Default: 5. Range: 1-10,000.
  • Relevance Threshold (1-100) — Filters out chunks below a minimum similarity score. Default: 70. Maps to cosine similarity.
  • Neighboring Chunks — Includes surrounding context from matched chunks. Default: 1 (one chunk before and after). Range: 0-10.
  • Hybrid Search — Combines semantic and keyword search with adjustable weighting:
    • 100% Keyword / 0% Semantic: Only exact word matches
    • 50% Keyword / 50% Semantic: Equal importance to meaning and exact words
    • 0% Keyword / 100% Semantic: Only meaning-based matching
For structured data (.csv and .xlsx files), you can enable Text-to-SQL search:
  • Translates natural language queries into SQL
  • Supports fuzzy search capability (increases query complexity)
  • Recommended models: Claude 4 Sonnet, GPT 4.1, Claude 3.7 Sonnet
Important: For both Semantic and Text-to-SQL search to function, indexes must be created and the data source configured during its creation.

Available Retrieval Tools (MCP)

When using the MCP Multi-Hop Retrieval, the Airia Datasource MCP Server exposes:
ToolWhat it does
Datastore Semantic and Keyword SearchSearches a single data source using vector and/or keyword matching
Multi Data Store Semantic and Keyword SearchSearches across multiple data sources simultaneously
Datastore Filename SearchFinds files by name within a data source
File Content RetrievalRetrieves full content of a specific file
Graph Database Cypher QueryQueries a knowledge graph (when Graph RAG is enabled)
Multi Data Store SQL QueryRuns natural language-to-SQL queries on structured data

Limitations

⚠️ Known Limitation: Tool calls — including datasource retrieval tools — do not currently work within nested agent (agent-in-agent) configurations. This is a platform-wide limitation affecting all MCPs, not specific to the Datasource MCP Server.

Best Practices

  • Write descriptive datasource names and descriptions for intelligent LLM source selection
  • Retain all retrieval tools unless you’re confident they won’t be needed
  • Ensure retrieval configuration exists in all knowledge-base AI steps
  • Use the Data Search Step for straightforward workflows where cost and speed matter
  • Use the MCP Multi-Hop Retrieval for complex, multi-source, or conversational use cases
  • Test multi-hop retrieval extensively before deployment