Ingestion settings
Prepare your data so that the LLM can search and retrieve relevant information at query time. You can change the settings to ingest data in the most optimal way depending on your use case and document type.
Hybrid search
Hybrid search enables both semenatic and lexical types of search. Semantic search will retrieve the most relevant information based on semantic meaning relevance, while lexical search returns exact word matching (key word search). Hybrid search is only available for Pinecone vector database that supports dot product metric. It will be automatically enabled once Pinecone and keys are entered that are compatible with sparse vectors.
Scan document for images
The feature is enabled by default. It allows generating descriptions for the images in the documents. OCR solution is in place for extracting the text from the image.
Text-to-SQL
When to use it?
Text-to-SQL is available for csv and excel files only. Choose this functionality when you want to chat with your structured data. It is optimized to answer qualitative questions like “What is the revenue generated by product A for the year to date?” or “How many leads have we generated for the last year?”.
How to Use It?
Set up your Data Source
Activate the SQL indexing option in the Ingestion settings. You can choose Semantic, SQL only or Both for the csv/excel files in your Data Source.
-
Choose Semantic for tabular data when you want to enable text search based on meaning and context. Example use case: Survey documented in an excel where you have complete answers to open questions from customers. Such content is semi-structured. Example question: What are the common complaints customers have about Agent builder?
-
Choose SQL when you query structured content. Example question: How many complaints are registered as High priority?
-
Choose Both when you want to enable both types of search for your csv and excel files. For all other file types in the data source semantic search will be applied.
Use Data Source in the Agent
Activate Text-to-SQL retrieval in Data Source step. For example, if an agent must retrieve all sales records from an .xls file where the sales amount exceeds $5,000 and the date falls within Q1 2025, a SQL query like SELECT * FROM sales WHERE amount > 5000 AND date LIKE '2025-01%' provides an efficient and precise solution by leveraging the file’s structured format. By default, this option is disabled, and the Data Source relies on Semantic retrieval. Text-to-SQL search enabled will search only through csv/excel files from the Data source.
Hint: If you want to have both types of search enabled (in cases when you have csv/excel, and other file types, or you have chosen the option Both for csv/excel), drag and drop the Data Source twice on the canvas. Enable semantic for one of the two copies, and SQL for the other. Then connect both to the LLM.
Text-to-SQL settings
Next, you need to select the model that will be used in the agentic workflow.
You can enable Fuzzy search. This allows searching through records that were misspelled.
When the agent runs with the configured Data Source step, it will produce results based on the chosen settings. The Text-to-SQL retrieval agentic flow will output a structured result from the SQL query, dynamically generated from the user’s input.
The choice between semantic retrieval and SQL retrieval for agents depends on the query type, data structure, scalability needs, and maintenance considerations. For structured files like .csv and .xls with precise, structured queries, SQL retrieval is preferred for its efficiency and accuracy, and ability to answer qualitative questions. For natural language queries or when dealing with text fields requiring semantic understanding, semantic retrieval is advantageous. In practice, combining both methods often provides the most flexible and effective solution, especially for agents interacting with users through natural language.