Raptor Data - Version Control for RAG
Git-like versioning for RAG embedding pipelines w/ DX Focus
Raptor Data - Version Control for RAG – Git-like versioning for embedding pipelines
Summary: Raptor Data is a lightweight TypeScript SDK that applies Git-like version control to embeddings by hashing and diffing document chunks. It reduces re-embedding costs by identifying only changed content, supporting PDF and DOCx parsing with structure-aware processing across Node, Edge, and Browser environments.
What it does
Raptor Data parses documents with recursive chunking and semantic diffing to detect changes at the chunk level, avoiding full re-embedding on edits. It runs isomorphically on Node, Browser, and Cloudflare Workers, backed by a FastAPI server for optimized parsing.
Who it's for
Developers building retrieval-augmented generation (RAG) applications who need efficient version control and cost-effective embedding updates for frequently changing documents.
Why it matters
It solves the problem of costly and complex re-embedding workflows by enabling precise updates to embeddings, reducing vector database and API costs by up to 90%.