Set Up Graph RAG for Your Industry

This guide walks you through enabling Knowledge Graph Extraction on a data source, choosing or creating an industry ontology, reviewing the extracted graph, and using graph-enhanced retrieval in an agent.

When to Use Graph RAG

Enable Knowledge Graph Extraction when:

Your documents contain named entities with relationships (people, organizations, products, regulations, cases)
Users ask questions that require connecting information across documents
Your domain has specific terminology that benefits from structured entity recognition
You need more precise retrieval than vector search alone provides

Step 1: Create a Data Source and Enable Knowledge Graph Extraction

Navigate to Data Sources in your project
Click Create Data Source and configure your connector
In the Ingestion Settings section, toggle “Enable Knowledge Graph Extraction” to ON
Choose your extraction mode:
- Generic Entity Extraction — the LLM freely discovers entities (best for exploration)
- Industry-Specific Entity Extraction — you define which entity types to extract (best for production)

For this guide, we’ll use Industry-Specific to demonstrate the full workflow. See Knowledge Graph Extraction for full details on both modes.

Step 2: Select an Industry Preset

You’ll see a grid of industry cards:

Preset	Best for
General	Mixed-content knowledge bases, general business documents
Healthcare	Clinical records, medical research, patient documentation
Legal	Contracts, case law, regulatory filings, compliance documents
Finance	Financial reports, trading docs, audit records
Technology	Technical docs, architecture specs, product documentation
Manufacturing	Supply chain, quality records, material specifications
Energy	Production records, regulatory compliance, asset documentation

Click a preset to load its entity types. Review the suggested entity types — you can add, remove, or edit them before saving.

Creating a Custom Ontology

If none of the presets fit:

Click Add Custom Industry
Enter your industry name (e.g., “Pharmaceutical Research”, “Real Estate”, “Insurance Claims”)
Click Generate Entities — Airia’s AI suggests entity types based on your industry
Review and customize:
- Remove entity types that aren’t relevant
- Add missing entity types by typing a name and pressing Enter
- Click the edit icon on any entity to add a description (recommended for extraction accuracy)
Click Select to save

💡 Tip: Aim for 5-10 focused entity types rather than trying to cover everything. You can always add more later and re-ingest.

Step 3: Ingest Your Documents

Save your settings and trigger ingestion. The pipeline now runs the extended flow:

Parse → Chunk → Entity Extraction → Embed → Store

Entity extraction adds processing time proportional to the number of chunks and entity types. For a first test, start with a small, representative subset of your documents. Monitor ingestion progress in the data source detail view:

Generic extraction: Files go to Processed when complete (same as standard RAG)
Industry-specific extraction: Files transition through Parsed → Indexing → Indexed. Files in Parsed status are waiting for the indexing service — this is normal, not an error.

Step 4: Review the Extracted Graph

Once ingestion completes:

Go to your Data Sources list
Find your graph-enabled data source and click the three-dot menu (…)
Select View Graph
Review:
- Node counts by entity type — verify your configured types appear with reasonable counts
- Relationship counts by type — check that relationship types make sense for your domain
- If entity counts are very low, your entity type descriptions may need refinement

Troubleshooting Low Extraction Quality

Symptom	Likely cause	Fix
Very few entities extracted	Entity type names are too vague	Add descriptions to entity types, use more specific names
Wrong entity types being extracted	Preset doesn’t match your content	Switch to a custom ontology tailored to your documents
Too many false positive entities	Entity types are too broad	Replace generic types (e.g., `item`) with specific ones (e.g., `medical_device`)
Entities not connecting across documents	Entity names vary across docs	This is expected — the graph links entities to chunks, enabling cross-document retrieval at query time

Step 5: Use Graph-Enhanced Retrieval in an Agent

Option A: Data Search Step with Reranking

When Knowledge Graph Extraction is active, the Data Search Step automatically uses the integrated reranker to factor graph data into result ranking:

Add a Data Search Step to your agent
Select your graph-enabled data source
Reranking is automatically enabled
Results incorporate both vector similarity and entity-relationship signals

Option B: Agentic Retrieval with Graph Queries

For the most powerful graph retrieval, use the MCP Multi-Hop Retrieval:

Open your AI Model Step settings
Toggle on Datasources and select your graph-enabled data source
The LLM gains access to both vector search tools and Graph Database Cypher Query tools
The LLM can now autonomously combine text search with graph traversal — for example, first finding relevant chunks via semantic search, then querying the graph for related entities and their connected documents

💡 Example prompt to test: “Find all entities related to [specific entity in your data] and summarize their connections.” The agent should use the graph query tool to traverse relationships and return a structured answer.

Step 6: Iterate

Graph RAG quality improves with refinement:

Test with real user queries — identify where graph-enhanced retrieval produces better results than vector-only search
Adjust entity types — add types for entities users frequently ask about, remove types that produce noise
Re-ingest after changes — entity type changes require re-ingestion to take effect
Compare — run the same queries with and without Knowledge Graph Extraction to measure the impact

Example: Legal Domain Setup

Industry: Custom — “Corporate Legal” Entity types:

contract — “Legal agreements including MSAs, NDAs, SOWs, and amendments”
party — “Organizations or individuals that are parties to a contract”
clause — “Specific provisions within contracts such as termination, liability, and IP assignment”
regulation — “Government regulations, statutes, and compliance requirements”
jurisdiction — “Legal jurisdictions including countries, states, and regulatory bodies”
effective_date — “Dates when contracts, clauses, or regulations take effect”

Sample query: “Which contracts with Party X contain indemnification clauses that may conflict with Regulation Y?” With Graph RAG, the agent traverses: party → contract → clause (filtered to indemnification) → regulation, retrieving the precise chunks where these relationships are documented — across potentially dozens of separate contract files.

​When to Use Graph RAG

​Step 1: Create a Data Source and Enable Knowledge Graph Extraction

​Step 2: Select an Industry Preset

​Creating a Custom Ontology

​Step 3: Ingest Your Documents

​Step 4: Review the Extracted Graph

​Troubleshooting Low Extraction Quality

​Step 5: Use Graph-Enhanced Retrieval in an Agent

​Option A: Data Search Step with Reranking

​Option B: Agentic Retrieval with Graph Queries

​Step 6: Iterate

​Example: Legal Domain Setup