What is a Vector Database?
The storage layer that made RAG and AI agents practical.
A vector database is a specialized data store for high-dimensional embedding vectors that supports fast approximate nearest neighbor (ANN) search. It lets you store millions or billions of vectors (typically 384-3072 dimensions each) and retrieve the closest ones to a query vector in milliseconds using indexes like HNSW or IVF. Vector databases are the storage layer underneath RAG, semantic search, and AI agent memory.
A vector database is a specialized data store for high-dimensional embedding vectors that supports fast approximate nearest neighbor (ANN) search. It lets you store millions or billions of vectors (typically 384-3072 dimensions each) and retrieve the closest ones to a query vector in milliseconds using indexes like HNSW or IVF. Vector databases are the storage layer underneath RAG, semantic search, and AI agent memory.
In depth
Examples
- →Pinecone — managed, fast, $70/month starter; the original commercial vector DB
- →pgvector — PostgreSQL extension, free, good up to tens of millions of vectors
- →Qdrant — open-source, self-hostable in Rust, strong filtering + payload support
- →Weaviate — open-source + cloud, first-class GraphQL API, strong multimodal support
- →Chroma — embedded, runs in your Python process, ideal for prototypes and notebooks
- →FAISS (Meta) — library not database, algorithmic gold standard used inside many of the above
- →Milvus — distributed, scales to billions of vectors, popular in China and enterprise
- →Tycoon uses pgvector to store project memory embeddings alongside its PostgreSQL data
Related terms
Frequently asked questions
Do I need a dedicated vector database or can I use Postgres?
pgvector covers most small-to-medium workloads — up to tens of millions of vectors with HNSW indexing since pgvector 0.5. It's free, your existing Postgres tooling works, and you get transactional consistency with your other data. You need a dedicated vector DB when (a) you're past ~50M vectors and index rebuild times hurt, (b) you need specialized features like filtered ANN over massive metadata, or (c) your team prefers a managed offering with lower ops burden. For a solo founder or small startup, pgvector is almost always the right first choice. Scale out later when you actually hit the limits.
How much does a vector database cost?
Huge range. Free end: Chroma or pgvector on a small Postgres instance — under $50/month. Mid: Pinecone starter ($70/month), Qdrant Cloud small, Weaviate Cloud — $70-300/month for modest production. High: dedicated infrastructure for 100M+ vectors can run $1K-$10K+/month. Main drivers are vector count, dimensions, replicas for availability, and managed vs self-hosted. Most apps overestimate their vector DB budget — the LLM inference cost dwarfs the storage cost by 10-100x. Optimize LLM calls first, storage last.
What's the difference between cosine similarity, dot product, and Euclidean distance?
Three ways to measure 'close.' Cosine similarity measures the angle between vectors and ignores magnitude — good for normalized embeddings (which most modern embedding models produce). Dot product is equivalent to cosine when vectors are unit-normalized but faster to compute. Euclidean (L2) measures straight-line distance in the vector space — occasionally used for non-normalized embeddings. In practice: use cosine or dot product for modern embeddings (they're unit-norm anyway), use Euclidean only if your embedding model docs specifically recommend it. Mixing up metrics is a common bug that silently degrades retrieval quality.
How big can a vector database get?
Practical limits are storage, memory, and rebuild time. HNSW indexes hold vectors in RAM for speed: 10M × 1536 dims × 4 bytes ≈ 60GB, so a single node handles tens of millions comfortably. Beyond that you shard. Production systems at Anthropic, OpenAI, and Google routinely run billion-vector indexes using DiskANN or distributed IVF variants. The biggest public deployments exceed 10B vectors. For a startup, 100M-vector workloads are the practical ceiling of pgvector on a single node; beyond that, move to a distributed system like Milvus or managed Pinecone.
How do vector databases handle filters like 'only documents from user X'?
This is 'filtered ANN' and it's surprisingly tricky. Naive implementation — retrieve top-K then filter — breaks when the filter is restrictive (you retrieve 10 docs, all belong to other users, result is empty). Good vector DBs implement filtered HNSW or pre-filtering that combines the metadata filter with the ANN search natively. Qdrant, Weaviate, and Pinecone all do this well. pgvector with HNSW does it via Postgres row filtering combined with the index — adequate for most workloads. If you have highly restrictive filters (tenant isolation, category scoping), test filtered recall on your actual data before committing to a vendor; quality varies a lot.
Run your one-person company.
Hire your AI team in 30 seconds. Start for free.
Free to start · No credit card required · Set up in 30 seconds