Our Document Parsing product enables reliable extraction of text and structural information from a wide range of document formats, including PDFs, scanned files, and images. With minimal setup, you can convert raw documents into structured, machine-readable outputs ideal for downstream AI and retrieval workflows.
Each variant of Document Parsing is optimized for a specific use case, from simple OCR extraction to rich multimodal understanding of visual documents.
Here’s an overview of the available Document Parsing variants:
The Docling OCR variant extracts text and structural information from documents using advanced Optical Character Recognition (OCR). It’s ideal for processing PDFs, and supports many other document types.
The Chunking version of Docling OCR automatically splits the parsed document into smaller semantic chunks, optimized for Retrieval-Augmented Generation (RAG) workflows
.
These chunks can be easily embedded and stored in a Vector Store for efficient semantic search and retrieval.
Note: When chunking is enabled, images are excluded from the output to ensure clean, text-focused chunk generation.
Docling Vision extends parsing capabilities to multimodal documents that combine text and images. This variant not only extracts textual content but also analyzes and describes visual elements within the document.
Docling Vision – Chunking extends the Vision variant by enabling chunking for multimodal documents.
The Document Parsing endpoint accept files as base64-encoded input.
You should encode your document (e.g., PDF, PNG, JPG) into a base64 string before sending it in the request body.
Example (simplified JSON structure):
{ "base64_string": JVBERi0xLjUKJdDUxdgKNSAwIG9iago8PC9UeXBlIC9..." }
The Document Parsing API supports a wide range of file types across text, document, and image formats.
You can upload any of the following formats for processing:
Documents: pdf, docx, pptx, xlsx, html, md, csv
Images: jpeg, png
These formats cover most use cases for text extraction, document analysis, and multimodal parsing.
To begin using Document Parsing, you’ll first need to create an API key in the Cloud Portal.

In the Cloud Portal, navigate to Document Parsing to view and manage your existing API keys, then click Create.

Now, select the variant you’d like to use.

Enter:
Name: A unique identifier for your key.
Description (optional): A short note to describe its use.

And that's it! Once created, your API key and endpoint will be displayed.