Quick Start: Ingest and Search

This guide walks you through connecting a data source, ingesting documents, and searching them from an agent — in about 10 minutes.

Prerequisites

An Airia account with access to a project
At least one document to upload (PDF, DOCX, or TXT)

Step 1: Create a Data Source

Open your project and navigate to Data Sources
Click Add Data Source
Choose File Upload as the connector type (simplest for getting started)
Name your data source — use a descriptive name like “Product Documentation” or “HR Policies” (this name is visible to the LLM when using agentic retrieval, so make it meaningful)
Upload one or more files

Step 2: Configure Ingestion Settings

Before ingesting, review the key settings:

PDF Parser — For standard documents, Basic works fine. If your documents contain tables, images, or complex layouts, choose Advanced or Universal. See Ingestion Settings for details on each parser.
Scan Document for Images — Enabled by default. Leave it on if your documents contain relevant images or diagrams.
Vector Database — Leave as Airia DB (default) unless you’re bringing your own vector store.
Knowledge Graph Extraction — Leave off for this quick start. See the Graph RAG guide when you’re ready to try it.

Click Save to start ingestion. You can monitor progress in the data source detail view — files will show their processing status.

Step 3: Add the Data Source to an Agent

Once ingestion is complete:

Open or create an agent in the Agent Builder
You have two options:

Option A: Data Search Step (simple)

Drag a Data Search Step into your agent flow, before the AI Model step
Select your data source
Configure search settings:
- Max Results: 5 (default, good starting point)
- Relevance Threshold: 70 (default)
- Neighboring Chunks: 1 (includes surrounding context)
Connect the Data Search Step output to your AI Model step’s input

Option B: MCP Multi-Hop Retrieval (agentic)

Open your AI Model Step settings
Toggle on Datasources
Select your data source
The Airia Datasource MCP Server is automatically deployed — the LLM will dynamically search your data as needed

💡 Which should I choose? Start with Option B (MCP Multi-Hop Retrieval) for the most natural experience. The LLM decides when and how to search. Use Option A if you need deterministic, single-pass retrieval every time.

Step 4: Test Your Agent

Click Test in the agent builder
Ask a question about the content in your uploaded documents
The agent should respond with information grounded in your data, with source citations

If results are not relevant enough:

Try adjusting the Relevance Threshold lower (e.g., 50) to return more results
Try enabling Hybrid Search at 0.5 alpha for a blend of semantic and keyword matching
Check that your documents were fully ingested (file status should show as “Processed”)

Next Steps

Set up Graph RAG to extract structured entities and improve retrieval for complex queries
Configure hybrid search and reranking to fine-tune retrieval quality
Connect external agents via MCP to use your Airia data sources from Claude, GPT, or other LLM clients

​Prerequisites

​Step 1: Create a Data Source

​Step 2: Configure Ingestion Settings

​Step 3: Add the Data Source to an Agent

​Step 4: Test Your Agent

​Next Steps