Skip to main content

Connecting to Data Sources

Data sources in the Airia platform enable you to integrate content from various origins into your Agent. Once a data source is added, the data undergoes ingestion, encoding, and indexing so that it can be retrieved by the LLM based on the user’s query. We support sparse and dense vectors for efficient retrieval, as well as SQL indexing for text-to-SQL capabilities. The processed data is then stored in vector databases, making it ready for retrieval-augmented generation (RAG) and tooling operations. Airia’s Data Source connectors allow you to ingest different file types, which then serve as knowledge for your Agent.

Supported File Types

Our platform supports a wide range of file types for various documentation and data needs, ensuring you can process diverse content seamlessly.

Office Documents

  • Word Documents (.docx, .doc)
  • Excel Files (.xlsx, .xls)
  • PowerPoint Files (.pptx, .ppt)

Structured Data

  • JSON (.json)
  • CSV (.csv)
  • Markdown (.md, .mdx)
  • XML (.xml)
  • XML Schema Definition (.xsd)
  • YAML (.yaml)
  • HML (.hml)

Images

  • JPEG/JPG (.jpeg, .jpg)
  • PNG (.png)
  • BMP (.bmp)
  • TIFF (.tif, .tiff)
  • HEIF (.heif, .heic)

Other Document Types

  • Plain Text files (.txt)
  • PDF (.pdf)
💡 Note on Supported Content:
  • JSON files are supported as text only.
  • Excel and CSV files are supported as text for semantic search (with a file size limit of 50MB) and as SQL (with a file size limit of 100MB). If uploaded via File Upload data source the limit is 50MB per file.
  • HEIF files are not supported for Microsoft connectors due to how files are processed by those connectors.
  • PDF files are supported in either Text-only mode or Text with Images mode up to 100MB per file. If uploaded via File Upload data source the limit is 50MB per file.
  • Plain text files are supported up to 80MB per file. If uploaded via File Upload data source the limit is 50MB per file.
  • All other file types are supported up to 50MB per file.
⚠️ Warning on Script Files: Files containing script content (such as PHP, JS, etc.) are not supported via direct File Upload, regardless of their file type. Such files can, however, be ingested via any other compatible connector (e.g., through a cloud storage connector if they reside there).

How to Add a New Data Source

  1. Navigate to Data Sources In your project, go to the Data Sources sub-menu.
  2. Add New Data Source Click on Add data source.
  3. Select Connector Browse the connector library and select the appropriate data source connector (e.g., “Confluence,” “Google Drive,” “File Upload”).
  4. Configure Connector Details Provide a name for your data source and fill out any additional required fields specific to the selected connector type.
    💡 Note: Refer to each specific connector’s documentation for detailed configuration requirements.
  5. Configure Ingestion Settings (Optional) Choose the Vector database, Image scanning mode, and SQL indexing for structured data that best suits your data source’s content.
    💡 Note: For more information on these settings, see Ingestion Settings.
  6. Create Data Source Click Done to finalize the setup and initiate ingestion.

Managing a Data Source

After a data source has been created, you can easily manage it from the list view in the Data Sources section. Selecting an existing data source allows you to:
  • View and edit key details.
  • Review the data ingested from that source.
  • Adjust the selected content for ingestion.
  • For “File Upload”, you can also upload additional files directly to the platform within this section.