Connecting to Data Sources
Data sources in the Airia platform enable you to integrate content from various origins into your Agent. Once a data source is added, the data undergoes ingestion, encoding, and indexing so that it can be retrieved by the LLM based on the user’s query. We support sparse and dense vectors for efficient retrieval, as well as SQL indexing for text-to-SQL capabilities. The processed data is then stored in vector databases, making it ready for retrieval-augmented generation (RAG) and tooling operations. Airia’s Data Source connectors allow you to ingest different file types, which then serve as knowledge for your Agent.Supported File Types
Our platform supports a wide range of file types for various documentation and data needs, ensuring you can process diverse content seamlessly.Office Documents
- Word Documents (
.docx,.doc) - Excel Files (
.xlsx,.xls) - PowerPoint Files (
.pptx,.ppt)
Structured Data
- JSON (
.json) - CSV (
.csv) - Markdown (
.md,.mdx) - XML (
.xml) - XML Schema Definition (
.xsd) - YAML (
.yaml) - HML (
.hml)
Images
- JPEG/JPG (
.jpeg,.jpg) - PNG (
.png) - BMP (
.bmp) - TIFF (
.tif,.tiff) - HEIF (
.heif,.heic)
Other Document Types
- Plain Text files (
.txt) - PDF (
.pdf)
Audio and Video
Our platform supports text extraction from various audio and video file formats. For processing, the audio content from these files is extracted and converted to WAV format for speech-to-text.⚠️ Warning on Audio/Video Feature Access: This audio and video text extraction feature is currently available only to selected Early Access customers. Contact your sales representative to request access. Capabilities and pricing are subject to change during this phase.
Supported Audio Files
WAV(.wav)FLAC(.flac)MP3(.mp3)M4A(AAC-LC, HE-AAC, ALAC) (.m4a)OPUS(Opus in OGG container) (.opus)AAC(.aac)WEBM(audio only) (.webm)WMA(.wma)OGG(container for FLAC, OPUS, Vorbis) (.ogg)ASF(can contain WMA, WMV) (.asf)
Supported Video Files (Audio Track Extraction)
WEBM(Audio + Video) (.webm)MP4(.mp4)M4V(Apple variant of MP4) (.m4v)MOV(.mov)MKV(.mkv)MXF(.mxf)TS(streaming) (.ts)AVI(.avi)WMV(.wmv)
💡 Note on Supported Content:
- JSON files are supported as text only.
- Excel and CSV files are supported as text for semantic search (with a file size limit of 50MB) and as SQL (with a file size limit of 100MB). If uploaded via File Upload data source, the limit is 50MB per file.
- HEIF files are not supported for Microsoft connectors due to how files are processed by those connectors.
- PDF files are supported in either Text-only mode or Text with Images mode up to 100MB per file. If uploaded via File Upload data source, the limit is 50MB per file.
- Plain text files are supported up to 80MB per file. If uploaded via File Upload data source, the limit is 50MB per file.
- All other file types are supported up to 50MB per file.
💡 Note on Audio/Video Ingestion: Ingestion of extracted text from audio and video files occurs after speech-to-text processing, not in real-time.
⚠️ Warning on Script Files: Files containing script content (such as PHP, JS, etc.) are not supported via direct File Upload, regardless of their file type. Such files can, however, be ingested via any other compatible connector (e.g., through a cloud storage connector if they reside there).
How to Add a New Data Source
- Navigate to Data Sources In your project, go to the Data Sources sub-menu.
- Add New Data Source Click on Add data source.
- Select Connector Browse the connector library and select the appropriate data source connector (e.g., “Confluence,” “Google Drive,” “File Upload”).
-
Configure Connector Details Provide a name for your data source and fill out any additional required fields specific to the selected connector type.
💡 Note: Refer to each specific connector’s documentation for detailed configuration requirements.
-
Configure Ingestion Settings (Optional) Choose the Vector database, Image scanning mode, and SQL indexing for structured data that best suits your data source’s content.
💡 Note: For more information on these settings, see Ingestion Settings
- Create Data Source Click Done to finalize the setup and initiate ingestion.
Managing a Data Source
After a data source has been created, you can easily manage it from the list view in the Data Sources section. Selecting an existing data source allows you to:- View and edit key details.
- Review the data ingested from that source.
- Adjust the selected content for ingestion.
- For “File Upload”, you can also upload additional files directly to the platform within this section.
