How it works
The step accepts a Base64-encoded representation of a file and its MIME type. It sends the document to the Airia DocGen service, which uses Microsoft MarkItDown under the hood to produce a clean markdown string. The resulting markdown, along with metadata extracted from the document, is then available as output variables for the rest of the workflow.Prerequisites
- The MarkItDown step must be enabled on your workspace (feature flag:
step-sdk-markitdown). Contact your workspace administrator if the step does not appear in the step library. - The document must be available as a Base64-encoded string inside the workflow. If you are fetching the file from OneDrive, use the OneDrive — Download File step with File Return Mode set to Content to get Base64 output directly.
Inputs
| Parameter | Required | Description |
|---|---|---|
| Base64 Data | Yes | The Base64-encoded content of the document. Supports {{variable}} expressions. |
| MIME Type | Yes | The MIME type of the document. Must match one of the supported types. Supports {{variable}} expressions. |
{{variable}} expression syntax, so you can wire in output variables from earlier steps directly.
Outputs
| Variable | Type | Description |
|---|---|---|
MarkdownContent | string | The full markdown text extracted from the document. |
Title | string | null | The title extracted from the document, if available. |
SourceMimeType | string | null | The MIME type of the original document as recognized by the DocGen service. |
FileExtension | string | null | The file extension of the original document (e.g. pdf, docx). |
{{steps.markitdown.MarkdownContent}}.
Supported file types
| Format | MIME Type |
|---|---|
application/pdf | |
| Word (.docx) | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
| Word (.doc) | application/msword |
| PowerPoint (.pptx) | application/vnd.openxmlformats-officedocument.presentationml.presentation |
| Excel (.xlsx) | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
| Excel (.xls) | application/vnd.ms-excel |
| HTML | text/html |
| CSV | text/csv |
| JSON | application/json |
| XML | application/xml or text/xml |
| ZIP | application/zip |
| EPUB | application/epub+zip |
| Plain text | text/plain |
Validation note: If the MIME type value is known at design time (not a variable expression), the step validates it against this list when the workflow is saved. If you pass the MIME type via a {{variable}}, validation is deferred to runtime.
OneDrive workflow example
A typical pattern is to download a file from OneDrive and immediately parse it. Steps:-
OneDrive — Download File
- Path:
/Reports/Q1-Summary.pdf - File Return Mode:
Content(returns the file as Base64)
- Path:
-
MarkItDown
- Base64 Data:
{{steps.onedrive_download.FileContentBase64}} - MIME Type:
{{steps.onedrive_download.ContentType}}
- Base64 Data:
-
AI Model
- Prompt:
Summarize the following document:\n\n{{steps.markitdown.MarkdownContent}}
- Prompt:
ContentType from the download step into the MarkItDown step, the MIME type is resolved automatically at runtime without hardcoding it.
Notes and limitations
- No vector storage. The MarkItDown step performs a plain document-to-text conversion. It does not chunk, embed, or index the content. If you need the content stored in a data source for retrieval, use an Input step with on-demand parsing or ingest the content explicitly via a Data Source step.
- Base64 only. The step does not accept a download URL directly. If you have a URL (e.g. from OneDrive with File Return Mode set to Url), you must first fetch the file bytes via an HTTP action step and convert them to Base64 before passing to MarkItDown.
- Single document per invocation. The step processes one file at a time. To handle multiple files, use a loop with a MarkItDown step inside.
- Output size. Very large documents may produce long markdown strings. Be mindful of context window limits when passing
MarkdownContentdirectly to an AI model step.
