This guide walks you through enabling Knowledge Graph Extraction on a data source, choosing or creating an industry ontology, reviewing the extracted graph, and using graph-enhanced retrieval in an agent.Documentation Index
Fetch the complete documentation index at: https://explore.airia.com/llms.txt
Use this file to discover all available pages before exploring further.
When to Use Graph RAG
Enable Knowledge Graph Extraction when:- Your documents contain named entities with relationships (people, organizations, products, regulations, cases)
- Users ask questions that require connecting information across documents
- Your domain has specific terminology that benefits from structured entity recognition
- You need more precise retrieval than vector search alone provides
Step 1: Create a Data Source and Enable Knowledge Graph Extraction
- Navigate to Data Sources in your project
- Click Create Data Source and configure your connector
- In the Ingestion Settings section, toggle “Enable Knowledge Graph Extraction” to ON
- Choose your extraction mode:
- Generic Entity Extraction — the LLM freely discovers entities (best for exploration)
- Industry-Specific Entity Extraction — you define which entity types to extract (best for production)
Step 2: Select an Industry Preset
You’ll see a grid of industry cards:| Preset | Best for |
|---|---|
| General | Mixed-content knowledge bases, general business documents |
| Healthcare | Clinical records, medical research, patient documentation |
| Legal | Contracts, case law, regulatory filings, compliance documents |
| Finance | Financial reports, trading docs, audit records |
| Technology | Technical docs, architecture specs, product documentation |
| Manufacturing | Supply chain, quality records, material specifications |
| Energy | Production records, regulatory compliance, asset documentation |
Creating a Custom Ontology
If none of the presets fit:- Click Add Custom Industry
- Enter your industry name (e.g., “Pharmaceutical Research”, “Real Estate”, “Insurance Claims”)
- Click Generate Entities — Airia’s AI suggests entity types based on your industry
- Review and customize:
- Remove entity types that aren’t relevant
- Add missing entity types by typing a name and pressing Enter
- Click the edit icon on any entity to add a description (recommended for extraction accuracy)
- Click Select to save
💡 Tip: Aim for 5-10 focused entity types rather than trying to cover everything. You can always add more later and re-ingest.
Step 3: Ingest Your Documents
Save your settings and trigger ingestion. The pipeline now runs the extended flow:- Generic extraction: Files go to
Processedwhen complete (same as standard RAG) - Industry-specific extraction: Files transition through
Parsed→Indexing→Indexed. Files inParsedstatus are waiting for the indexing service — this is normal, not an error.
Step 4: Review the Extracted Graph
Once ingestion completes:- Go to your Data Sources list
- Find your graph-enabled data source and click the three-dot menu (…)
- Select View Graph
- Review:
- Node counts by entity type — verify your configured types appear with reasonable counts
- Relationship counts by type — check that relationship types make sense for your domain
- If entity counts are very low, your entity type descriptions may need refinement
Troubleshooting Low Extraction Quality
| Symptom | Likely cause | Fix |
|---|---|---|
| Very few entities extracted | Entity type names are too vague | Add descriptions to entity types, use more specific names |
| Wrong entity types being extracted | Preset doesn’t match your content | Switch to a custom ontology tailored to your documents |
| Too many false positive entities | Entity types are too broad | Replace generic types (e.g., item) with specific ones (e.g., medical_device) |
| Entities not connecting across documents | Entity names vary across docs | This is expected — the graph links entities to chunks, enabling cross-document retrieval at query time |
Step 5: Use Graph-Enhanced Retrieval in an Agent
Option A: Data Search Step with Reranking
When Knowledge Graph Extraction is active, the Data Search Step automatically uses the integrated reranker to factor graph data into result ranking:- Add a Data Search Step to your agent
- Select your graph-enabled data source
- Reranking is automatically enabled
- Results incorporate both vector similarity and entity-relationship signals
Option B: Agentic Retrieval with Graph Queries
For the most powerful graph retrieval, use the MCP Multi-Hop Retrieval:- Open your AI Model Step settings
- Toggle on Datasources and select your graph-enabled data source
- The LLM gains access to both vector search tools and Graph Database Cypher Query tools
- The LLM can now autonomously combine text search with graph traversal — for example, first finding relevant chunks via semantic search, then querying the graph for related entities and their connected documents
💡 Example prompt to test: “Find all entities related to [specific entity in your data] and summarize their connections.” The agent should use the graph query tool to traverse relationships and return a structured answer.
Step 6: Iterate
Graph RAG quality improves with refinement:- Test with real user queries — identify where graph-enhanced retrieval produces better results than vector-only search
- Adjust entity types — add types for entities users frequently ask about, remove types that produce noise
- Re-ingest after changes — entity type changes require re-ingestion to take effect
- Compare — run the same queries with and without Knowledge Graph Extraction to measure the impact
Example: Legal Domain Setup
Industry: Custom — “Corporate Legal” Entity types:contract— “Legal agreements including MSAs, NDAs, SOWs, and amendments”party— “Organizations or individuals that are parties to a contract”clause— “Specific provisions within contracts such as termination, liability, and IP assignment”regulation— “Government regulations, statutes, and compliance requirements”jurisdiction— “Legal jurisdictions including countries, states, and regulatory bodies”effective_date— “Dates when contracts, clauses, or regulations take effect”
party → contract → clause (filtered to indemnification) → regulation, retrieving the precise chunks where these relationships are documented — across potentially dozens of separate contract files.