Skip to main content
The MarkItDown step converts a document into markdown text at any point in a workflow. Unlike the parser available on the Input step — which is tied to files attached in a chat session — the MarkItDown step works entirely in-agent, making it the right choice whenever a document arrives through an action step (OneDrive, SharePoint, an HTTP call, etc.) rather than from a user upload. Common use cases include extracting the text body of a PDF fetched from OneDrive before sending it to a model, converting an Excel report to a readable table, and preprocessing Office documents prior to a data-extraction prompt.

How it works

The step accepts a Base64-encoded representation of a file and its MIME type. It sends the document to the Airia DocGen service, which uses Microsoft MarkItDown under the hood to produce a clean markdown string. The resulting markdown, along with metadata extracted from the document, is then available as output variables for the rest of the workflow.

Prerequisites

  • The MarkItDown step must be enabled on your workspace (feature flag: step-sdk-markitdown). Contact your workspace administrator if the step does not appear in the step library.
  • The document must be available as a Base64-encoded string inside the workflow. If you are fetching the file from OneDrive, use the OneDrive — Download File step with File Return Mode set to Content to get Base64 output directly.

Inputs

ParameterRequiredDescription
Base64 DataYesThe Base64-encoded content of the document. Supports {{variable}} expressions.
MIME TypeYesThe MIME type of the document. Must match one of the supported types. Supports {{variable}} expressions.
Both fields support Airia’s standard {{variable}} expression syntax, so you can wire in output variables from earlier steps directly.

Outputs

VariableTypeDescription
MarkdownContentstringThe full markdown text extracted from the document.
Titlestring | nullThe title extracted from the document, if available.
SourceMimeTypestring | nullThe MIME type of the original document as recognized by the DocGen service.
FileExtensionstring | nullThe file extension of the original document (e.g. pdf, docx).
Reference these in downstream steps using the standard output variable syntax, for example {{steps.markitdown.MarkdownContent}}.

Supported file types

FormatMIME Type
PDFapplication/pdf
Word (.docx)application/vnd.openxmlformats-officedocument.wordprocessingml.document
Word (.doc)application/msword
PowerPoint (.pptx)application/vnd.openxmlformats-officedocument.presentationml.presentation
Excel (.xlsx)application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Excel (.xls)application/vnd.ms-excel
HTMLtext/html
CSVtext/csv
JSONapplication/json
XMLapplication/xml or text/xml
ZIPapplication/zip
EPUBapplication/epub+zip
Plain texttext/plain
Validation note: If the MIME type value is known at design time (not a variable expression), the step validates it against this list when the workflow is saved. If you pass the MIME type via a {{variable}}, validation is deferred to runtime.

OneDrive workflow example

A typical pattern is to download a file from OneDrive and immediately parse it. Steps:
  1. OneDrive — Download File
    • Path: /Reports/Q1-Summary.pdf
    • File Return Mode: Content (returns the file as Base64)
  2. MarkItDown
    • Base64 Data: {{steps.onedrive_download.FileContentBase64}}
    • MIME Type: {{steps.onedrive_download.ContentType}}
  3. AI Model
    • Prompt: Summarize the following document:\n\n{{steps.markitdown.MarkdownContent}}
By wiring ContentType from the download step into the MarkItDown step, the MIME type is resolved automatically at runtime without hardcoding it.

Notes and limitations

  • No vector storage. The MarkItDown step performs a plain document-to-text conversion. It does not chunk, embed, or index the content. If you need the content stored in a data source for retrieval, use an Input step with on-demand parsing or ingest the content explicitly via a Data Source step.
  • Base64 only. The step does not accept a download URL directly. If you have a URL (e.g. from OneDrive with File Return Mode set to Url), you must first fetch the file bytes via an HTTP action step and convert them to Base64 before passing to MarkItDown.
  • Single document per invocation. The step processes one file at a time. To handle multiple files, use a loop with a MarkItDown step inside.
  • Output size. Very large documents may produce long markdown strings. Be mindful of context window limits when passing MarkdownContent directly to an AI model step.