> ## Documentation Index
> Fetch the complete documentation index at: https://explore.airia.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Set Up Graph RAG for Your Industry

> End-to-end guide for enabling Knowledge Graph Extraction, choosing an ontology, reviewing the graph, and using graph-enhanced retrieval.

This guide walks you through enabling Knowledge Graph Extraction on a data source, choosing or creating an industry ontology, reviewing the extracted graph, and using graph-enhanced retrieval in an agent.

## When to Use Graph RAG

Enable Knowledge Graph Extraction when:

* Your documents contain named entities with relationships (people, organizations, products, regulations, cases)
* Users ask questions that require connecting information across documents
* Your domain has specific terminology that benefits from structured entity recognition
* You need more precise retrieval than vector search alone provides

## Step 1: Create a Data Source and Enable Knowledge Graph Extraction

1. Navigate to **Data Sources** in your project
2. Click **Create Data Source** and configure your connector
3. In the **Ingestion Settings** section, toggle **"Enable Knowledge Graph Extraction"** to ON
4. Choose your extraction mode:
   * **Generic Entity Extraction** — the LLM freely discovers entities (best for exploration)
   * **Industry-Specific Entity Extraction** — you define which entity types to extract (best for production)

For this guide, we'll use **Industry-Specific** to demonstrate the full workflow. See [Knowledge Graph Extraction](./knowledge-graph-extraction.md) for full details on both modes.

## Step 2: Select an Industry Preset

You'll see a grid of industry cards:

| Preset            | Best for                                                       |
| ----------------- | -------------------------------------------------------------- |
| **General**       | Mixed-content knowledge bases, general business documents      |
| **Healthcare**    | Clinical records, medical research, patient documentation      |
| **Legal**         | Contracts, case law, regulatory filings, compliance documents  |
| **Finance**       | Financial reports, trading docs, audit records                 |
| **Technology**    | Technical docs, architecture specs, product documentation      |
| **Manufacturing** | Supply chain, quality records, material specifications         |
| **Energy**        | Production records, regulatory compliance, asset documentation |

Click a preset to load its entity types. Review the suggested entity types — you can add, remove, or edit them before saving.

### Creating a Custom Ontology

If none of the presets fit:

1. Click **Add Custom Industry**
2. Enter your industry name (e.g., "Pharmaceutical Research", "Real Estate", "Insurance Claims")
3. Click **Generate Entities** — Airia's AI suggests entity types based on your industry
4. Review and customize:
   * Remove entity types that aren't relevant
   * Add missing entity types by typing a name and pressing Enter
   * Click the edit icon on any entity to add a description (recommended for extraction accuracy)
5. Click **Select** to save

> 💡 **Tip:** Aim for 5-10 focused entity types rather than trying to cover everything. You can always add more later and re-ingest.

## Step 3: Ingest Your Documents

Save your settings and trigger ingestion. The pipeline now runs the extended flow:

```
Parse → Chunk → Entity Extraction → Embed → Store
```

Entity extraction adds processing time proportional to the number of chunks and entity types. For a first test, start with a small, representative subset of your documents.

Monitor ingestion progress in the data source detail view:

* **Generic extraction**: Files go to `Processed` when complete (same as standard RAG)
* **Industry-specific extraction**: Files transition through `Parsed` → `Indexing` → `Indexed`. Files in `Parsed` status are waiting for the indexing service — this is normal, not an error.

## Step 4: Review the Extracted Graph

Once ingestion completes:

1. Go to your **Data Sources** list
2. Find your graph-enabled data source and click the **three-dot menu** (...)
3. Select **View Graph**
4. Review:
   * **Node counts** by entity type — verify your configured types appear with reasonable counts
   * **Relationship counts** by type — check that relationship types make sense for your domain
   * If entity counts are very low, your entity type descriptions may need refinement

### Troubleshooting Low Extraction Quality

| Symptom                                  | Likely cause                      | Fix                                                                                                    |
| ---------------------------------------- | --------------------------------- | ------------------------------------------------------------------------------------------------------ |
| Very few entities extracted              | Entity type names are too vague   | Add descriptions to entity types, use more specific names                                              |
| Wrong entity types being extracted       | Preset doesn't match your content | Switch to a custom ontology tailored to your documents                                                 |
| Too many false positive entities         | Entity types are too broad        | Replace generic types (e.g., `item`) with specific ones (e.g., `medical_device`)                       |
| Entities not connecting across documents | Entity names vary across docs     | This is expected — the graph links entities to chunks, enabling cross-document retrieval at query time |

## Step 5: Use Graph-Enhanced Retrieval in an Agent

### Option A: Data Search Step with Reranking

When Knowledge Graph Extraction is active, the Data Search Step automatically uses the integrated reranker to factor graph data into result ranking:

1. Add a **Data Search Step** to your agent
2. Select your graph-enabled data source
3. Reranking is automatically enabled
4. Results incorporate both vector similarity and entity-relationship signals

### Option B: Agentic Retrieval with Graph Queries

For the most powerful graph retrieval, use the **MCP Multi-Hop Retrieval**:

1. Open your **AI Model Step** settings
2. Toggle on **Datasources** and select your graph-enabled data source
3. The LLM gains access to both vector search tools and **Graph Database Cypher Query** tools
4. The LLM can now autonomously combine text search with graph traversal — for example, first finding relevant chunks via semantic search, then querying the graph for related entities and their connected documents

> 💡 **Example prompt to test:** "Find all entities related to \[specific entity in your data] and summarize their connections." The agent should use the graph query tool to traverse relationships and return a structured answer.

## Step 6: Iterate

Graph RAG quality improves with refinement:

1. **Test with real user queries** — identify where graph-enhanced retrieval produces better results than vector-only search
2. **Adjust entity types** — add types for entities users frequently ask about, remove types that produce noise
3. **Re-ingest after changes** — entity type changes require re-ingestion to take effect
4. **Compare** — run the same queries with and without Knowledge Graph Extraction to measure the impact

## Example: Legal Domain Setup

**Industry:** Custom — "Corporate Legal"

**Entity types:**

* `contract` — "Legal agreements including MSAs, NDAs, SOWs, and amendments"
* `party` — "Organizations or individuals that are parties to a contract"
* `clause` — "Specific provisions within contracts such as termination, liability, and IP assignment"
* `regulation` — "Government regulations, statutes, and compliance requirements"
* `jurisdiction` — "Legal jurisdictions including countries, states, and regulatory bodies"
* `effective_date` — "Dates when contracts, clauses, or regulations take effect"

**Sample query:** "Which contracts with Party X contain indemnification clauses that may conflict with Regulation Y?"

With Graph RAG, the agent traverses: `party` → `contract` → `clause` (filtered to indemnification) → `regulation`, retrieving the precise chunks where these relationships are documented — across potentially dozens of separate contract files.
