Skip to main content

Documentation Index

Fetch the complete documentation index at: https://explore.airia.com/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through enabling Knowledge Graph Extraction on a data source, choosing or creating an industry ontology, reviewing the extracted graph, and using graph-enhanced retrieval in an agent.

When to Use Graph RAG

Enable Knowledge Graph Extraction when:
  • Your documents contain named entities with relationships (people, organizations, products, regulations, cases)
  • Users ask questions that require connecting information across documents
  • Your domain has specific terminology that benefits from structured entity recognition
  • You need more precise retrieval than vector search alone provides

Step 1: Create a Data Source and Enable Knowledge Graph Extraction

  1. Navigate to Data Sources in your project
  2. Click Create Data Source and configure your connector
  3. In the Ingestion Settings section, toggle “Enable Knowledge Graph Extraction” to ON
  4. Choose your extraction mode:
    • Generic Entity Extraction — the LLM freely discovers entities (best for exploration)
    • Industry-Specific Entity Extraction — you define which entity types to extract (best for production)
For this guide, we’ll use Industry-Specific to demonstrate the full workflow. See Knowledge Graph Extraction for full details on both modes.

Step 2: Select an Industry Preset

You’ll see a grid of industry cards:
PresetBest for
GeneralMixed-content knowledge bases, general business documents
HealthcareClinical records, medical research, patient documentation
LegalContracts, case law, regulatory filings, compliance documents
FinanceFinancial reports, trading docs, audit records
TechnologyTechnical docs, architecture specs, product documentation
ManufacturingSupply chain, quality records, material specifications
EnergyProduction records, regulatory compliance, asset documentation
Click a preset to load its entity types. Review the suggested entity types — you can add, remove, or edit them before saving.

Creating a Custom Ontology

If none of the presets fit:
  1. Click Add Custom Industry
  2. Enter your industry name (e.g., “Pharmaceutical Research”, “Real Estate”, “Insurance Claims”)
  3. Click Generate Entities — Airia’s AI suggests entity types based on your industry
  4. Review and customize:
    • Remove entity types that aren’t relevant
    • Add missing entity types by typing a name and pressing Enter
    • Click the edit icon on any entity to add a description (recommended for extraction accuracy)
  5. Click Select to save
💡 Tip: Aim for 5-10 focused entity types rather than trying to cover everything. You can always add more later and re-ingest.

Step 3: Ingest Your Documents

Save your settings and trigger ingestion. The pipeline now runs the extended flow:
Parse → Chunk → Entity Extraction → Embed → Store
Entity extraction adds processing time proportional to the number of chunks and entity types. For a first test, start with a small, representative subset of your documents. Monitor ingestion progress in the data source detail view:
  • Generic extraction: Files go to Processed when complete (same as standard RAG)
  • Industry-specific extraction: Files transition through ParsedIndexingIndexed. Files in Parsed status are waiting for the indexing service — this is normal, not an error.

Step 4: Review the Extracted Graph

Once ingestion completes:
  1. Go to your Data Sources list
  2. Find your graph-enabled data source and click the three-dot menu (…)
  3. Select View Graph
  4. Review:
    • Node counts by entity type — verify your configured types appear with reasonable counts
    • Relationship counts by type — check that relationship types make sense for your domain
    • If entity counts are very low, your entity type descriptions may need refinement

Troubleshooting Low Extraction Quality

SymptomLikely causeFix
Very few entities extractedEntity type names are too vagueAdd descriptions to entity types, use more specific names
Wrong entity types being extractedPreset doesn’t match your contentSwitch to a custom ontology tailored to your documents
Too many false positive entitiesEntity types are too broadReplace generic types (e.g., item) with specific ones (e.g., medical_device)
Entities not connecting across documentsEntity names vary across docsThis is expected — the graph links entities to chunks, enabling cross-document retrieval at query time

Step 5: Use Graph-Enhanced Retrieval in an Agent

Option A: Data Search Step with Reranking

When Knowledge Graph Extraction is active, the Data Search Step automatically uses the integrated reranker to factor graph data into result ranking:
  1. Add a Data Search Step to your agent
  2. Select your graph-enabled data source
  3. Reranking is automatically enabled
  4. Results incorporate both vector similarity and entity-relationship signals

Option B: Agentic Retrieval with Graph Queries

For the most powerful graph retrieval, use the MCP Multi-Hop Retrieval:
  1. Open your AI Model Step settings
  2. Toggle on Datasources and select your graph-enabled data source
  3. The LLM gains access to both vector search tools and Graph Database Cypher Query tools
  4. The LLM can now autonomously combine text search with graph traversal — for example, first finding relevant chunks via semantic search, then querying the graph for related entities and their connected documents
💡 Example prompt to test: “Find all entities related to [specific entity in your data] and summarize their connections.” The agent should use the graph query tool to traverse relationships and return a structured answer.

Step 6: Iterate

Graph RAG quality improves with refinement:
  1. Test with real user queries — identify where graph-enhanced retrieval produces better results than vector-only search
  2. Adjust entity types — add types for entities users frequently ask about, remove types that produce noise
  3. Re-ingest after changes — entity type changes require re-ingestion to take effect
  4. Compare — run the same queries with and without Knowledge Graph Extraction to measure the impact
Industry: Custom — “Corporate Legal” Entity types:
  • contract — “Legal agreements including MSAs, NDAs, SOWs, and amendments”
  • party — “Organizations or individuals that are parties to a contract”
  • clause — “Specific provisions within contracts such as termination, liability, and IP assignment”
  • regulation — “Government regulations, statutes, and compliance requirements”
  • jurisdiction — “Legal jurisdictions including countries, states, and regulatory bodies”
  • effective_date — “Dates when contracts, clauses, or regulations take effect”
Sample query: “Which contracts with Party X contain indemnification clauses that may conflict with Regulation Y?” With Graph RAG, the agent traverses: partycontractclause (filtered to indemnification) → regulation, retrieving the precise chunks where these relationships are documented — across potentially dozens of separate contract files.