Gemini Embedding 2
Google's first natively multimodal embedding model
Gemini Embedding 2 – Google's first natively multimodal embedding model
Summary: Gemini Embedding 2 maps text, images, video, audio, and documents into a single embedding space, enabling unified multimodal retrieval and classification. It supports multiple media types and languages with flexible input sizes and embedding dimensions, simplifying embedding pipelines.
What it does
It generates embeddings for diverse media including text (up to 8192 tokens), images, video, audio, and PDFs without separate preprocessing. The model supports interleaved inputs and flexible embedding sizes using Matryoshka Representation Learning.
Who it's for
AI developers and ML engineers building search, assistants, knowledge bases, and multimodal AI applications.
Why it matters
It streamlines multimodal AI workflows by replacing fragmented embedding pipelines with a single model that handles multiple media types natively.