> ## Documentation Index
> Fetch the complete documentation index at: https://explore.airia.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Website Connector

# Set Up the Website Connector

The Website Connector allows you to ingest public web content from a specified URL into your project.
Airia uses [Crawl4AI](https://github.com/unclecode/crawl4ai) for web data extraction.

## Setting Up the Connector

1. **Select Website Data Source**
   Navigate to the **Data Sources** section of your project. Click **Add data source** and select **Website** from the available library.
2. **Configure Basic Settings**
   In the **Website address** field, type in the URL of the website you wish to crawl. By default, up to `1,000` pages under that address will be crawled and ingested into your data source.
3. **Configure Advanced Settings (Optional)**
   In the **Advanced settings** section, you can fine-tune the ingestion process to limit what gets crawled, helping to optimize your embedding cost.
   * **Include/Exclude specific URLs**: Entered URLs will be processed in addition to, or removed from, the general crawl based on your page limit setting.
   * **Ingest URLs with specific phrase**: Ingest only URLs that contain a specific phrase. Note that this might take more time and increase the time for ingestion.
   * **Ingest external links**: Enable or disable the ingestion of external links found within the website content.
4. **Create Data Source**
   Once you have configured your settings, click **Done** to create the data source and begin the ingestion process.
5. **Monitor Ingestion Status**
   You can view the current ingestion status by clicking on the data source again. In the detailed list, you will see all ingested pages listed as specific URLs.

## Next Steps

After your data has been successfully ingested, the Website data source is now ready to be used with an Agent.
