Skip to main content

Add and Configure a Data Source in an Agent

Data sources enable your Agent to access specific knowledge, grounding its responses with relevant content. The Agent uses configured search settings and created indexes to retrieve information, generating more accurate outputs for user queries.

Prerequisites

Add a Data Source

To add a data source to your Agent:
  1. While creating your Agent, drag and drop the desired data source from the Data Sources section in the left side panel into your Agent workflow.

Configure Search Settings

After adding a data source, configure its search behavior:
  1. Select Files for the Agent (Optional) By default, the Agent retrieves data from the entire data source. To narrow the search to specific documents, click the Select files for this Agent button and choose the desired files.
  2. Choose Search Type Select the search type best suited for your use case:
Adjust your workspace’s search settings to get the most relevant and useful results from semantic or hybrid searches. Each setting fine-tunes how your system finds and ranks content based on your query.

1. Max Results

This setting controls the maximum number of text chunks returned based on semantic similarity to your query.
How to Configure
Choose a number for Max Results (e.g., 5, 10, or 20). The system will retrieve up to this many most semantically relevant chunks.
When to Use
Use this setting to limit the volume of results, preventing information overload for your Large Language Model (LLM) or focusing strictly on the most pertinent information. Example:
  • Query: “How do I integrate Jira with ServiceNow?”
  • Max Results: 3
The system returns the 3 most semantically related chunks (e.g., “integration setup,” “API configuration,” “permissions”).

2. Relevance Threshold (1–100)

This setting filters out chunks that do not meet a minimum semantic similarity score. The score is internally converted to a 1–100 scale.
How to Configure
Choose a Relevance Threshold (e.g., 70). Only chunks with a relevance score equal to or greater than your chosen threshold will be returned. A setting of 0 means no threshold is applied.
When to Use
  • Higher threshold (e.g., 8090): For highly precise results, such as searching a technical knowledge base.
  • Lower threshold (e.g., 4060): For broader context, suitable for brainstorming or research.
Example:
  • Query: “Jira integration errors”
  • Relevance Threshold: 80
Only chunks very closely related to Jira errors will be retrieved, excluding general setup or unrelated tool information.

3. Neighboring Chunks

When a chunk matches your search, this option allows you to include surrounding chunks (before and after it within the same document) to provide additional context.
How to Configure
Choose how many Neighboring Chunks to include:
  • 0: Return only the matching chunk.
  • 15: Include a few nearby chunks for context.
  • Full document: Include the entire document if one of its chunks matches.
When to Use
Use this when context is crucial, especially when a single sentence or paragraph alone doesn’t convey the full meaning. Example:
  • Query: “ServiceNow workflow automation”
  • Neighboring Chunks: Full Document
If a match is found in one paragraph, the entire document detailing the automation setup will be sent to the LLM, ensuring comprehensive context.

4. Hybrid Search (Keyword + Semantic)

Hybrid Search combines semantic search (understanding meaning) with keyword search (exact word matches). You can assign a weight to each method.
  • Keyword Search: Finds exact words or identifiers (e.g., “JIRA-1234,” “Project Falcon”).
  • Semantic Search: Finds similar meanings (e.g., “how to connect Jira” will match “Jira integration steps”).
How to Configure
Adjust the balance using a slider or numeric values for Keyword Search and Semantic Search weights:
  • 100% Keyword / 0% Semantic: Relies solely on exact word matches.
  • 50% Keyword / 50% Semantic: Gives equal importance to meaning and exact words.
  • 20% Keyword / 80% Semantic: Prioritizes meaning while still allowing for precise terms.
When to Use
  • Keyword-heavy: For searches involving product codes, specific names, or identifiers.
  • Semantic-heavy: For conceptual or general questions.
  • Balanced: For queries that blend both precise terms and broader concepts.
Example:
  • Query: “Banana”
  • Keyword weight 100%: Finds documents containing the exact word “banana.”
  • Semantic weight 100%: Finds documents about “fruit,” “tropical food,” or “smoothies.”
  • Hybrid (50/50): Finds both exact matches and semantically related concepts.

Summary of Search Settings

SettingWhat It ControlsBest ForExample
Max ResultsHow many chunks are returnedControlling the size of results”Top 5 relevant answers”
Relevance ThresholdHow relevant chunks must beFiltering out weak matches”Only results > 80% similarity”
Neighboring ChunksHow much context to includeProviding context-rich answers”Return Full document when hit found”
Hybrid SearchBalance between meaning and exact matchCombining precise + conceptual queries”Product codes + topic meaning”
Text-to-SQL search is suitable for .csv and .xlsx files, especially when the data is primarily numerical and lacks deep semantic meaning. This method allows the Agent to generate SQL queries from natural language input to retrieve structured results.
  • Model Selection: Choose the LLM responsible for generating SQL queries within the Agent workflow.
    Recommendation: For stable and accurate results, select “High Quality Capable” models.
    • High Quality (best performance):
      • Claude 4 Sonnet
      • GPT 4.1
      • Claude 3.7 Sonnet
      • GPT 4o
    • Sufficient Quality:
      • GPT 4.1 mini
      • Claude 3.5 Sonnet
      • GPT 4o mini
  • Fuzzy Search: Enable to allow the system to search through records even with misspellings in the user’s query.
    Fuzzy search can increase query generation complexity.
Important: For both Semantic and Text-to-SQL search to function, indexes must be created and the data source configured during its creation. Check Ingestion settings for details.

Choosing the Right Search Method

The optimal search method depends on your query type, data structure, and desired outcome:
  • Use SQL Retrieval for:
    • Structured files (.csv, .xlsx).
    • Precise, structured queries.
    • Efficiently answering qualitative questions.
    • Data that is mostly numerical and does not have strong semantic meaning.
  • Use Semantic Retrieval for:
    • Natural language queries.
    • Unstructured or text-heavy documents.
    • Cases requiring semantic understanding.
Combining both methods can offer the most flexible and effective solution, especially when Agents interact with users through natural language.