MarkItDown Step

The MarkItDown step converts a document into markdown text at any point in a workflow. Unlike the parser available on the Input step — which is tied to files attached in a chat session — the MarkItDown step works entirely in-agent, making it the right choice whenever a document arrives through an action step (OneDrive, SharePoint, an HTTP call, etc.) rather than from a user upload. Common use cases include extracting the text body of a PDF fetched from OneDrive before sending it to a model, converting an Excel report to a readable table, and preprocessing Office documents prior to a data-extraction prompt.

How it works

The step accepts a Base64-encoded representation of a file and its MIME type. It sends the document to the Airia DocGen service, which uses Microsoft MarkItDown under the hood to produce a clean markdown string. The resulting markdown, along with metadata extracted from the document, is then available as output variables for the rest of the workflow.

Prerequisites

The MarkItDown step must be enabled on your workspace (feature flag: step-sdk-markitdown). Contact your workspace administrator if the step does not appear in the step library.
The document must be available as a Base64-encoded string inside the workflow. If you are fetching the file from OneDrive, use the OneDrive — Download File step with File Return Mode set to Content to get Base64 output directly.

Inputs

Parameter	Required	Description
Base64 Data	Yes	The Base64-encoded content of the document. Supports `{{variable}}` expressions.
MIME Type	Yes	The MIME type of the document. Must match one of the supported types. Supports `{{variable}}` expressions.

Both fields support Airia’s standard {{variable}} expression syntax, so you can wire in output variables from earlier steps directly.

Outputs

Variable	Type	Description
`MarkdownContent`	`string`	The full markdown text extracted from the document.
`Title`	`string \| null`	The title extracted from the document, if available.
`SourceMimeType`	`string \| null`	The MIME type of the original document as recognized by the DocGen service.
`FileExtension`	`string \| null`	The file extension of the original document (e.g. `pdf`, `docx`).

Reference these in downstream steps using the standard output variable syntax, for example {{steps.markitdown.MarkdownContent}}.

Supported file types

Format	MIME Type
PDF	`application/pdf`
Word (.docx)	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`
Word (.doc)	`application/msword`
PowerPoint (.pptx)	`application/vnd.openxmlformats-officedocument.presentationml.presentation`
Excel (.xlsx)	`application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
Excel (.xls)	`application/vnd.ms-excel`
HTML	`text/html`
CSV	`text/csv`
JSON	`application/json`
XML	`application/xml` or `text/xml`
ZIP	`application/zip`
EPUB	`application/epub+zip`
Plain text	`text/plain`

Validation note: If the MIME type value is known at design time (not a variable expression), the step validates it against this list when the workflow is saved. If you pass the MIME type via a {{variable}}, validation is deferred to runtime.

OneDrive workflow example

A typical pattern is to download a file from OneDrive and immediately parse it. Steps:

OneDrive — Download File
- Path: /Reports/Q1-Summary.pdf
- File Return Mode: Content (returns the file as Base64)
MarkItDown
- Base64 Data: {{steps.onedrive_download.FileContentBase64}}
- MIME Type: {{steps.onedrive_download.ContentType}}
AI Model
- Prompt: Summarize the following document:\n\n{{steps.markitdown.MarkdownContent}}

By wiring ContentType from the download step into the MarkItDown step, the MIME type is resolved automatically at runtime without hardcoding it.

Notes and limitations

No vector storage. The MarkItDown step performs a plain document-to-text conversion. It does not chunk, embed, or index the content. If you need the content stored in a data source for retrieval, use an Input step with on-demand parsing or ingest the content explicitly via a Data Source step.
Base64 only. The step does not accept a download URL directly. If you have a URL (e.g. from OneDrive with File Return Mode set to Url), you must first fetch the file bytes via an HTTP action step and convert them to Base64 before passing to MarkItDown.
Single document per invocation. The step processes one file at a time. To handle multiple files, use a loop with a MarkItDown step inside.
Output size. Very large documents may produce long markdown strings. Be mindful of context window limits when passing MarkdownContent directly to an AI model step.

Overview

Agent Basics

Interface Options

Sharing Agents

How it works

Prerequisites

Inputs

Outputs

Supported file types

OneDrive workflow example

Notes and limitations

Overview

Agent Basics

Interface Options

Sharing Agents

Documentation Index

​How it works

​Prerequisites

​Inputs

​Outputs

​Supported file types

​OneDrive workflow example

​Notes and limitations

How it works

Prerequisites

Inputs

Outputs

Supported file types

OneDrive workflow example

Notes and limitations