> ## Documentation Index
> Fetch the complete documentation index at: https://explore.airia.com/llms.txt
> Use this file to discover all available pages before exploring further.

# MarkItDown Step

> Convert documents to markdown mid-workflow using the MarkItDown step.

The MarkItDown step converts a document into markdown text at any point in a workflow. Unlike the parser available on the Input step — which is tied to files attached in a chat session — the MarkItDown step works entirely in-agent, making it the right choice whenever a document arrives through an action step (OneDrive, SharePoint, an HTTP call, etc.) rather than from a user upload.

Common use cases include extracting the text body of a PDF fetched from OneDrive before sending it to a model, converting an Excel report to a readable table, and preprocessing Office documents prior to a data-extraction prompt.

***

## How it works

The step accepts a **Base64-encoded** representation of a file and its MIME type. It sends the document to the Airia DocGen service, which uses Microsoft MarkItDown under the hood to produce a clean markdown string. The resulting markdown, along with metadata extracted from the document, is then available as output variables for the rest of the workflow.

***

## Prerequisites

* The **MarkItDown** step must be enabled on your workspace (feature flag: `step-sdk-markitdown`). Contact your workspace administrator if the step does not appear in the step library.
* The document must be available as a **Base64-encoded string** inside the workflow. If you are fetching the file from OneDrive, use the [OneDrive — Download File](#onedrive-workflow-example) step with **File Return Mode** set to **Content** to get Base64 output directly.

***

## Inputs

| Parameter       | Required | Description                                                                                                                         |
| --------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **Base64 Data** | Yes      | The Base64-encoded content of the document. Supports `{{variable}}` expressions.                                                    |
| **MIME Type**   | Yes      | The MIME type of the document. Must match one of the [supported types](#supported-file-types). Supports `{{variable}}` expressions. |

Both fields support Airia's standard `{{variable}}` expression syntax, so you can wire in output variables from earlier steps directly.

***

## Outputs

| Variable          | Type             | Description                                                                 |
| ----------------- | ---------------- | --------------------------------------------------------------------------- |
| `MarkdownContent` | `string`         | The full markdown text extracted from the document.                         |
| `Title`           | `string \| null` | The title extracted from the document, if available.                        |
| `SourceMimeType`  | `string \| null` | The MIME type of the original document as recognized by the DocGen service. |
| `FileExtension`   | `string \| null` | The file extension of the original document (e.g. `pdf`, `docx`).           |

Reference these in downstream steps using the standard output variable syntax, for example `{{steps.markitdown.MarkdownContent}}`.

***

## Supported file types

| Format             | MIME Type                                                                   |
| ------------------ | --------------------------------------------------------------------------- |
| PDF                | `application/pdf`                                                           |
| Word (.docx)       | `application/vnd.openxmlformats-officedocument.wordprocessingml.document`   |
| Word (.doc)        | `application/msword`                                                        |
| PowerPoint (.pptx) | `application/vnd.openxmlformats-officedocument.presentationml.presentation` |
| Excel (.xlsx)      | `application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`         |
| Excel (.xls)       | `application/vnd.ms-excel`                                                  |
| HTML               | `text/html`                                                                 |
| CSV                | `text/csv`                                                                  |
| JSON               | `application/json`                                                          |
| XML                | `application/xml` or `text/xml`                                             |
| ZIP                | `application/zip`                                                           |
| EPUB               | `application/epub+zip`                                                      |
| Plain text         | `text/plain`                                                                |

> **Validation note:** If the MIME type value is known at design time (not a variable expression), the step validates it against this list when the workflow is saved. If you pass the MIME type via a `{{variable}}`, validation is deferred to runtime.

***

## OneDrive workflow example

A typical pattern is to download a file from OneDrive and immediately parse it.

**Steps:**

1. **OneDrive — Download File**
   * **Path:** `/Reports/Q1-Summary.pdf`
   * **File Return Mode:** `Content` (returns the file as Base64)

2. **MarkItDown**
   * **Base64 Data:** `{{steps.onedrive_download.FileContentBase64}}`
   * **MIME Type:** `{{steps.onedrive_download.ContentType}}`

3. **AI Model**
   * **Prompt:** `Summarize the following document:\n\n{{steps.markitdown.MarkdownContent}}`

By wiring `ContentType` from the download step into the MarkItDown step, the MIME type is resolved automatically at runtime without hardcoding it.

***

## Notes and limitations

* **No vector storage.** The MarkItDown step performs a plain document-to-text conversion. It does not chunk, embed, or index the content. If you need the content stored in a data source for retrieval, use an Input step with on-demand parsing or ingest the content explicitly via a Data Source step.
* **Base64 only.** The step does not accept a download URL directly. If you have a URL (e.g. from OneDrive with **File Return Mode** set to **Url**), you must first fetch the file bytes via an HTTP action step and convert them to Base64 before passing to MarkItDown.
* **Single document per invocation.** The step processes one file at a time. To handle multiple files, use a loop with a MarkItDown step inside.
* **Output size.** Very large documents may produce long markdown strings. Be mindful of context window limits when passing `MarkdownContent` directly to an AI model step.