Docuglean
Extract structured data from any document in 3 lines
Docuglean – Open-source SDK for structured data extraction from documents
Summary: Docuglean is an open-source SDK that extracts structured data from various documents using multiple AI providers. It supports type-safe extraction, batch processing, and document classification, available in Python and TypeScript.
What it does
Docuglean provides a unified API to extract structured data from receipts, invoices, contracts, and other documents. It supports OpenAI, Mistral, Google Gemini, and HuggingFace, with features like type-safe outputs, batch processing, and local parsing for common formats.
Who it's for
It is designed for developers and teams building fintech or expense tracking apps, processing invoices or receipts at scale, and anyone needing reliable document parsing without vendor lock-in.
Why it matters
Docuglean solves the complexity of writing and maintaining document parsing code by offering a provider-agnostic, type-safe, and batch-capable SDK that reduces development effort and supports multiple AI backends.