Learn

What is a Vector Database?

The storage layer that made RAG and AI agents practical.

A vector database is a specialized data store for high-dimensional embedding vectors that supports fast approximate nearest neighbor (ANN) search. It lets you store millions or billions of vectors (typically 384-3072 dimensions each) and retrieve the closest ones to a query vector in milliseconds using indexes like HNSW or IVF. Vector databases are the storage layer underneath RAG, semantic search, and AI agent memory.

Free to startNo credit card requiredUpdated Apr 2026
Short answer

A vector database is a specialized data store for high-dimensional embedding vectors that supports fast approximate nearest neighbor (ANN) search. It lets you store millions or billions of vectors (typically 384-3072 dimensions each) and retrieve the closest ones to a query vector in milliseconds using indexes like HNSW or IVF. Vector databases are the storage layer underneath RAG, semantic search, and AI agent memory.

In depth

Traditional databases index scalar values — strings, numbers, dates — for exact or range queries. Vector databases index dense float vectors and answer 'what are the K most similar vectors to this one' queries by cosine similarity or dot product. The math is simple, but doing it fast on billions of 1536-dimensional vectors is hard, which is why this became its own category of database. The core technique is approximate nearest neighbor (ANN) indexing. Exact KNN is O(n) per query — too slow past a few hundred thousand vectors. ANN trades a small amount of recall (typically 95-99% of true nearest neighbors) for orders-of-magnitude speedup. The dominant algorithms are HNSW (hierarchical navigable small world graphs, fast and accurate, high memory), IVF (inverted file with product quantization, lower memory, slightly less accurate), and DiskANN (disk-backed, cheap at massive scale). Most production vector DBs implement HNSW as the default with IVF or DiskANN as options for scale. The vendor landscape in 2026 breaks into four categories. (1) Managed pure-play: Pinecone (fast, expensive, easy), Weaviate Cloud, Qdrant Cloud, and Vertex AI Matching Engine. (2) Open-source self-hostable: Weaviate, Qdrant, Milvus, Vespa, and the veteran FAISS library from Meta. (3) Extensions to existing databases: pgvector for PostgreSQL, MongoDB Atlas Vector Search, Redis VSS, Elasticsearch dense_vector. (4) Embedded: Chroma, LanceDB — run in your process with zero ops. The right choice depends on scale, existing infrastructure, and whether you want managed ops. A small app under 1M vectors can use pgvector or Chroma happily. A production RAG service with 100M+ vectors usually picks Pinecone, Qdrant, or Weaviate for the dedicated tooling. The quality of a vector database is measured by three axes. Recall: what fraction of true nearest neighbors the ANN index actually returns — you want 95%+. Query latency: typically 5-50ms for millions of vectors on SSD. Throughput: queries per second at your recall target. Index build time and memory footprint also matter at scale. Benchmarks like ANN-Benchmarks publish standardized comparisons but they don't capture real-world concerns like metadata filtering, hybrid search, or incremental index updates — which is where production systems actually differentiate. For AI agents and RAG, the vector DB is usually not the interesting part of the stack — the embedding model and the retrieval/reranking logic matter more. But a wrong vector DB choice kills an agent's perceived intelligence by making retrieval slow (users see 3-second delays) or inaccurate (agent answers from the wrong document). Tycoon uses pgvector because agent workloads are under 10M vectors per project and PostgreSQL was already in the stack — adding a dedicated vector service would add ops complexity with no quality gain at that scale.

Examples

  • Pinecone — managed, fast, $70/month starter; the original commercial vector DB
  • pgvector — PostgreSQL extension, free, good up to tens of millions of vectors
  • Qdrant — open-source, self-hostable in Rust, strong filtering + payload support
  • Weaviate — open-source + cloud, first-class GraphQL API, strong multimodal support
  • Chroma — embedded, runs in your Python process, ideal for prototypes and notebooks
  • FAISS (Meta) — library not database, algorithmic gold standard used inside many of the above
  • Milvus — distributed, scales to billions of vectors, popular in China and enterprise
  • Tycoon uses pgvector to store project memory embeddings alongside its PostgreSQL data

Related terms

Frequently asked questions

Do I need a dedicated vector database or can I use Postgres?

pgvector covers most small-to-medium workloads — up to tens of millions of vectors with HNSW indexing since pgvector 0.5. It's free, your existing Postgres tooling works, and you get transactional consistency with your other data. You need a dedicated vector DB when (a) you're past ~50M vectors and index rebuild times hurt, (b) you need specialized features like filtered ANN over massive metadata, or (c) your team prefers a managed offering with lower ops burden. For a solo founder or small startup, pgvector is almost always the right first choice. Scale out later when you actually hit the limits.

How much does a vector database cost?

Huge range. Free end: Chroma or pgvector on a small Postgres instance — under $50/month. Mid: Pinecone starter ($70/month), Qdrant Cloud small, Weaviate Cloud — $70-300/month for modest production. High: dedicated infrastructure for 100M+ vectors can run $1K-$10K+/month. Main drivers are vector count, dimensions, replicas for availability, and managed vs self-hosted. Most apps overestimate their vector DB budget — the LLM inference cost dwarfs the storage cost by 10-100x. Optimize LLM calls first, storage last.

What's the difference between cosine similarity, dot product, and Euclidean distance?

Three ways to measure 'close.' Cosine similarity measures the angle between vectors and ignores magnitude — good for normalized embeddings (which most modern embedding models produce). Dot product is equivalent to cosine when vectors are unit-normalized but faster to compute. Euclidean (L2) measures straight-line distance in the vector space — occasionally used for non-normalized embeddings. In practice: use cosine or dot product for modern embeddings (they're unit-norm anyway), use Euclidean only if your embedding model docs specifically recommend it. Mixing up metrics is a common bug that silently degrades retrieval quality.

How big can a vector database get?

Practical limits are storage, memory, and rebuild time. HNSW indexes hold vectors in RAM for speed: 10M × 1536 dims × 4 bytes ≈ 60GB, so a single node handles tens of millions comfortably. Beyond that you shard. Production systems at Anthropic, OpenAI, and Google routinely run billion-vector indexes using DiskANN or distributed IVF variants. The biggest public deployments exceed 10B vectors. For a startup, 100M-vector workloads are the practical ceiling of pgvector on a single node; beyond that, move to a distributed system like Milvus or managed Pinecone.

How do vector databases handle filters like 'only documents from user X'?

This is 'filtered ANN' and it's surprisingly tricky. Naive implementation — retrieve top-K then filter — breaks when the filter is restrictive (you retrieve 10 docs, all belong to other users, result is empty). Good vector DBs implement filtered HNSW or pre-filtering that combines the metadata filter with the ANN search natively. Qdrant, Weaviate, and Pinecone all do this well. pgvector with HNSW does it via Postgres row filtering combined with the index — adequate for most workloads. If you have highly restrictive filters (tenant isolation, category scoping), test filtered recall on your actual data before committing to a vendor; quality varies a lot.

Run your one-person company.

Hire your AI team in 30 seconds. Start for free.

Free to start · No credit card required · Set up in 30 seconds