Learn

What are Vector Embeddings?

How neural networks turn meaning into math.

A vector embedding is a dense numerical representation of a piece of content — typically a 384 to 3072 dimensional float vector — produced by a neural network trained so that meaning-similar inputs yield geometrically close vectors. Embeddings turn text, code, images, or audio into a shared numerical space where cosine similarity approximates semantic similarity, enabling retrieval, clustering, classification, and recommendation without rule-based features.

Free to startNo credit card requiredUpdated Apr 2026
Short answer

A vector embedding is a dense numerical representation of a piece of content — typically a 384 to 3072 dimensional float vector — produced by a neural network trained so that meaning-similar inputs yield geometrically close vectors. Embeddings turn text, code, images, or audio into a shared numerical space where cosine similarity approximates semantic similarity, enabling retrieval, clustering, classification, and recommendation without rule-based features.

In depth

Embeddings are what let computers measure meaning. A model like OpenAI text-embedding-3-large reads a sentence and outputs a 3072-dimensional vector. A different but semantically similar sentence produces a different vector that sits close to the first by cosine similarity. The embedding model is trained, usually by contrastive learning on massive paired data, so that 'dog bites man' and 'canine attacks person' land near each other while 'dog bites man' and 'man bites dog' are farther apart. Text embeddings dominated early, but the concept generalizes. CLIP embeddings (OpenAI, 2021) put images and text in the same space so you can search images by natural language. Code embeddings (voyage-code-3, CodeBERT) put source code in a space where implementations of the same algorithm cluster together. Multimodal embeddings (Jina Embeddings v4, Google multimodal) handle text + image + audio in a single model. By 2026 most production systems use some mix of modality-specific and multimodal embeddings depending on the task. The production embedding stack has three layers. (1) Choosing a model. Closed-source: OpenAI text-embedding-3-large ($0.13/1M tokens, strong), Cohere Embed v3 ($0.10/1M, competitive multilingual), Voyage voyage-3-large. Open-source: bge-m3, nomic-embed-v2, GTE-Qwen2 — increasingly competitive and self-hostable. (2) Generating embeddings at index time — batch API calls for throughput, typical cost under $10 for indexing a million documents. (3) Generating embeddings at query time — here latency matters, typically 50-200ms over network or sub-10ms self-hosted on GPU. Embedding quality is measured on MTEB (Massive Text Embedding Benchmark), which aggregates 56 retrieval, clustering, and classification tasks. Scores cluster tightly at the top — the gap between OpenAI text-embedding-3-large and the best open-source model is under 2 points on many tasks. The bigger variance is domain fit: a legal-finetuned embedding will crush a general model on legal text by 15+ points and vice versa. Test on your actual data; MTEB rankings are suggestive, not dispositive. Dimensions matter less than people think. Going from 768 to 3072 dimensions improves recall by 2-5% typically — not dramatic. Matryoshka embeddings, built into OpenAI text-embedding-3 and several newer models, let you truncate the vector to 256 or 512 dimensions at query time with minimal quality loss, saving 10x on storage and query cost. For most agent and RAG applications, 1024-dimensional embeddings are a good sweet spot between quality, speed, and cost. For Tycoon and AI agents broadly, embeddings are load-bearing for three things: semantic search over company docs, retrieval of relevant past conversation for agent memory, and similarity-based routing ('which of my 8 AI employees should handle this task?'). Bad embedding choice (too small, wrong domain, stale model) silently degrades every one of those, which is why embedding model selection deserves its own evaluation pass when building an agent.

Examples

  • word2vec (Mikolov, 2013) — the embedding that started the modern era, 300-dim word vectors learned from skip-gram
  • OpenAI text-embedding-3-large — current commercial workhorse, 3072 dims, supports Matryoshka truncation
  • Cohere Embed v3 — strong multilingual, optimized for RAG retrieval quality
  • bge-m3 (BAAI) — best open-source, supports dense + sparse + multi-vector retrieval in one model
  • CLIP (OpenAI) — joint image-text embeddings powering 'search photos with words' and Stable Diffusion's text conditioning
  • voyage-code-3 — code-specialized, outperforms general models on code search by 10-15 points
  • Tycoon embeds every project doc and conversation with text-embedding-3-small for Astra's memory retrieval
  • Spotify song embeddings — acoustic features in a learned space where similar-sounding tracks cluster

Related terms

Frequently asked questions

What's the difference between an embedding and a token?

A token is a unit of text (roughly a word or subword) that an LLM processes — 'unbelievable' might be 3 tokens. Each token has a static embedding from the model's vocabulary, typically 1000-4000 dimensions depending on the model. But when people say 'embedding' in the context of search or RAG, they usually mean a single vector summarizing an entire passage, not per-token vectors. That passage-level embedding is produced by a dedicated embedding model (not the LLM itself) trained specifically for similarity search. An LLM's internal per-token embeddings are not directly useful for retrieval — they're optimized for next-token prediction, not similarity.

How do I choose an embedding model?

Four-step decision: (1) Modality — text-only, multilingual, code, or multimodal. Pick accordingly. (2) Managed vs self-hosted — OpenAI or Cohere if you don't want ops; bge-m3 or nomic-embed-v2 if you do. (3) Dimensions — 1024 is a good default balancing quality and cost; Matryoshka-compatible models let you adjust later. (4) Evaluate on your data — build a small test set of 50 queries with expected results, measure recall@10 for 2-3 candidate models. The gap between models is usually 5-20% on your specific workload, much bigger than MTEB suggests. Don't skip this step.

What happens if I change embedding models?

Full re-indexing. Embeddings from different models live in different vector spaces and are not comparable. If you have 10M documents indexed with text-embedding-3-small and switch to Cohere Embed v3, you must embed all 10M again with the new model and rebuild the ANN index. This is why embedding model choice is sticky — migrations are expensive. Plan for it: version your embeddings by model, keep the embedding model identifier in metadata, and budget for periodic migrations every 12-24 months as models improve substantially.

Can I fine-tune embeddings for my domain?

Yes, and it helps when your domain has jargon or structure that general models miss. Sentence-Transformers lets you fine-tune open-source embedding models on contrastive pairs from your domain in a few GPU-hours. Typical gains: 5-20% recall@10 improvement over general embeddings on legal, medical, or technical domains. For most startups the ROI isn't there — pick a good general model and move on. Consider fine-tuning when (a) you have 10K+ labeled query-document pairs, (b) your retrieval quality is the bottleneck limiting product quality, or (c) you're in a regulated domain where off-the-shelf models demonstrably underperform.

Do embeddings leak private information?

Partially, yes — this is an active research area. Inversion attacks can reconstruct approximate text from embeddings given access to the embedding model and many samples. For most use cases this is acceptable (the original text is stored alongside the vector anyway), but for regulated workloads treat embeddings as PII-equivalent: encrypt at rest, control access, and don't ship them to third parties without a DPA. If you're using a managed embedding API, you're already trusting that vendor with the plaintext; using their embeddings adds no incremental risk. For fully private workloads, self-host an open-source model like bge-m3.

Run your one-person company.

Hire your AI team in 30 seconds. Start for free.

Free to start · No credit card required · Set up in 30 seconds