Vector search in Postgres with pgvector: embeddings, HNSW, and metadata filters
pgvector turns Postgres into a vector database: store embeddings, rank by cosine distance with <=>, index with HNSW, and filter by metadata in one SQL query.
On this page
- The extension and the vector column
- Generating and inserting an embedding
- The similarity query: <=> is cosine distance
- Indexing: HNSW vs IVFFlat, and the recall/build trade-off
- The key move: a metadata WHERE filter plus vector ranking
- When a dedicated vector database actually beats pgvector
- Takeaways
- FAQ
You've built a RAG pipeline. Your chunks have embeddings, and they also have metadata — a tag, a source, a user_id. Now you want the obvious thing: "rank the most semantically similar chunks, but only the ones tagged hr." With a standalone vector database, that's two systems, two writes to keep in sync, and a metadata filter bolted onto an index that was designed for pure nearest-neighbor search. With pgvector, it's one line of SQL: WHERE tag = 'hr' ORDER BY embedding <=> query.
That's the whole pitch for vector search in Postgres. Your embeddings live in the same table as your relational data, so you filter with plain SQL WHERE and rank by similarity in the same query — the filter-then-rank pattern that good RAG needs, expressed natively. This is exactly the two-stage retrieval I leaned on in my metadata-filtered RAG work: pre-filter candidates by metadata before semantic ranking. In Python I described that as a pipeline of steps. In Postgres, it collapses into one statement.
I'll walk the whole thing — the extension, the column, inserting an embedding, the similarity query, the index choice that actually matters, and the metadata filter that ties it together. Then the honest part: when a dedicated vector DB still wins.
The extension and the vector column
pgvector ships as a Postgres extension. On Supabase, RDS, or a plain Postgres 15+, you turn it on once:
CREATE EXTENSION IF NOT EXISTS vector;
That gives you a new vector column type. The dimension is fixed per column and must match your embedding model. OpenAI's text-embedding-3-small returns 1536 dimensions, so:
CREATE TABLE chunks (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
content text NOT NULL,
tag text NOT NULL, -- "hr", "finance", "legal"
source text NOT NULL, -- "handbook-2026.pdf"
embedding vector(1536) -- the dimension is part of the type
);
The thing to notice: embedding is just a column on the same row as content, tag, and source. No second datastore, no foreign key into a vector index living somewhere else. The metadata and the vector are physically the same row, which is the entire reason the filter-then-rank query is cheap.
Generating and inserting an embedding
You compute the embedding in your app and write it like any other value. The wire format pgvector accepts is a string that looks like a JSON array — '[0.013,-0.022,...]':
import OpenAI from "openai";
const openai = new OpenAI();
async function insertChunk(content: string, tag: string, source: string) {
const { data } = await openai.embeddings.create({
model: "text-embedding-3-small",
input: content,
});
const embedding = `[${data[0].embedding.join(",")}]`; // pgvector literal
await sql`
INSERT INTO chunks (content, tag, source, embedding)
VALUES (${content}, ${tag}, ${source}, ${embedding})
`;
}
One discipline that pays off: embed in batches at ingest, not per-row in a loop. The OpenAI embeddings endpoint takes an array of inputs and returns them in order, so a thousand chunks is a handful of API calls, not a thousand. The cost and latency difference is the gap between "ingest finishes" and "ingest times out."
The similarity query: <=> is cosine distance
pgvector adds distance operators. The one you want for normalized text embeddings is <=>, cosine distance — 0 means identical direction, larger means less similar. (<-> is L2/Euclidean, <#> is negative inner product; pick the one your model was trained for. For OpenAI embeddings, cosine.)
Embed the query the same way, then order by distance:
SELECT id, content, tag,
embedding <=> '[...query vector...]' AS distance
FROM chunks
ORDER BY embedding <=> '[...query vector...]'
LIMIT 6;
ORDER BY distance LIMIT k is your top-k nearest-neighbor search. There's no special API — it's a sort. That's the whole point: similarity ranking is expressed in the same language as every other query you write. You can join it, you can paginate it, you can wrap it in a view.
A note on similarity vs. distance: people often want a 0–1 similarity score for a threshold. Cosine similarity is 1 - distance, so 1 - (embedding <=> query) gives you the familiar number, and you can drop weak matches with WHERE 1 - (embedding <=> query) > 0.7.
Indexing: HNSW vs IVFFlat, and the recall/build trade-off
The query above works with no index at all — but it's a sequential scan that computes distance against every row. Fine for 10,000 chunks, a problem at 10 million. You add an approximate nearest neighbor (ANN) index so the search skips most of the table.
pgvector gives you two. The operator class in the index must match the operator in your query — for <=>, that's vector_cosine_ops:
-- HNSW: graph-based, my default in 2026.
CREATE INDEX ON chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- IVFFlat: clusters rows into lists, then searches the nearest few.
CREATE INDEX ON chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Here's the trade-off, stated plainly:
- HNSW builds a navigable small-world graph. It gives higher recall at a given speed and — critically — you can build it on an empty or growing table; it doesn't need to see your data distribution first. The cost is a slower, more memory-hungry build and a larger index. Query-time recall is tunable with
SET hnsw.ef_search = 100(higher = better recall, slower). - IVFFlat partitions vectors into
listsclusters and only scans the closest few at query time. It builds much faster and smaller, but recall is sensitive to thelistsparameter, and it really wants the table already populated when you build it, so the clusters reflect real data. Tune query recall withSET ivfflat.probes = 10.
My default in 2026 is HNSW, for a boring reason: I rarely know my final data distribution up front, and HNSW doesn't make me re-index every time the corpus grows. I reach for IVFFlat only when build time or index size genuinely hurts — a very large, fairly static corpus where I can afford to build once. If recall feels low, the first knob is ef_search (HNSW) or probes (IVFFlat), not a different index.
The key move: a metadata WHERE filter plus vector ranking
This is why you kept the vectors in Postgres. Pure similarity search is a liability on a multi-topic corpus — embedding closeness is semantic proximity, not topical relevance, so an HR question happily pulls in finance chunks that share vocabulary. (I dug into exactly that failure mode in metadata-filtered RAG.) The fix is to narrow by metadata before you rank:
SELECT id, content, source,
1 - (embedding <=> $1) AS similarity
FROM chunks
WHERE tag = 'hr' -- metadata pre-filter: SQL does this for free
ORDER BY embedding <=> $1 -- semantic rank, within the filtered set
LIMIT 6;
WHERE tag = 'hr' and ORDER BY embedding <=> $1 in one query is the entire two-stage retriever — collapsed into a single round trip the planner optimizes as a whole. The off-topic chunks never reach the ranker, because Postgres filtered them out first. In a standalone vector DB you'd either pre-filter in a separate system or lean on the index's metadata-filtering support, which historically meant degraded recall when the filter is selective. Here it's just a WHERE clause — the thing relational databases have been good at for forty years.
One real gotcha worth knowing: with a very selective filter, an ANN index can hurt. If tag = 'hr' matches 0.1% of rows, the HNSW graph walk may discard most of its candidates after traversal and return too few. The pragmatic answers: add a B-tree index on tag so a selective filter can pre-narrow, and for high-selectivity filters let the planner fall back to an exact scan over the small filtered set — which is often faster than an approximate graph walk anyway. Test with your real selectivity; don't assume the ANN path always wins.
This filter-then-rank-in-SQL pattern is the same instinct behind keeping correctness in the database that I wrote about in race-free slot reservations in Postgres: when the data and the logic live in the same place, you stop synchronizing two systems and start trusting one. It's also why I treat pgvector as the natural home for the retrieval layer of a chatbot like the AI chatbot I shipped — the metadata pre-filter that design depends on is a SQL WHERE.
When a dedicated vector database actually beats pgvector
I'm not going to pretend pgvector wins everywhere. Reach for a purpose-built vector DB (Pinecone, Qdrant, Weaviate, Milvus) when:
- Scale crosses into the tens-of-millions-plus vectors and ANN latency under load is your primary SLO. Dedicated engines are built around the index; Postgres is a general-purpose database that happens to host one.
- You need horizontal sharding of the vector index across machines, or features like distributed rebuilds, that Postgres doesn't give you out of the box.
- Your workload is almost entirely vector search with little relational data — then the "embeddings next to your tables" advantage evaporates, and a focused engine is simpler.
- You want first-class hybrid search (dense + sparse/BM25 fusion) built in, rather than assembling it yourself with
tsvector+ pgvector.
But for the common case — millions of vectors, embeddings that belong next to relational rows you already filter and join, and a team that already runs Postgres — pgvector means one database, one backup, one set of transactions, one query language. You don't add an operational dependency or a sync problem to get vector search. You add an extension.
Takeaways
CREATE EXTENSION vector, avector(1536)column, and<=>are the whole core API. Similarity search isORDER BY embedding <=> query LIMIT k— a sort, not a special endpoint.- Index with HNSW (
vector_cosine_ops) by default — better recall and it builds on growing tables. Use IVFFlat only when build time or index size genuinely hurts on a static corpus. - The killer feature is the metadata filter:
WHERE tag = 'hr' ORDER BY embedding <=> querydoes filter-then-rank in one query, no second system. - Watch selective filters — add a B-tree on the filter column and let exact scans win when the filtered set is tiny.
- Switch to a dedicated vector DB at tens-of-millions scale, when you need sharding, or when the workload is pure vector search with no relational neighbors to filter against.
The best vector database might be the Postgres you're already running. Turn on the extension, add a column, and let your WHERE clause do what it was always good at.
Frequently asked questions
- What is pgvector and how does vector search work in Postgres?
- pgvector is a Postgres extension that adds a vector column type and distance operators. You store an embedding alongside your relational columns, then run nearest-neighbor search with ORDER BY embedding <=> query LIMIT k, where <=> is cosine distance. It turns Postgres into a vector database without a second system.
- Should I use HNSW or IVFFlat for a pgvector index?
- Use HNSW by default: it gives higher recall at a given speed and can be built on an empty or growing table, at the cost of a slower, larger build. Choose IVFFlat only when build time or index size genuinely hurts on a large, mostly static corpus, since it builds faster but needs populated data and careful 'lists' tuning.
- How do I combine a metadata filter with vector similarity in pgvector?
- Put both in one query: WHERE tag = 'hr' ORDER BY embedding <=> query LIMIT 6. The WHERE clause pre-filters candidates by metadata before the vector ranking runs, so off-topic rows never reach the ranker. Add a B-tree index on the filter column so selective filters stay fast.
- When should I use a dedicated vector database instead of pgvector?
- Move to a purpose-built vector DB (Pinecone, Qdrant, Weaviate, Milvus) when you cross tens of millions of vectors with strict latency SLOs, need horizontal sharding of the index, or your workload is almost entirely vector search with little relational data. For millions of vectors that live next to relational rows you filter and join, pgvector keeps everything in one database.
Metadata-filtered RAG fixes single-shot retrieval that returns junk on multi-topic corpora. How I built a metadata pre-filter, vector search, and LLM rerank pipeline.
ReadPreventing double-bookings in Postgres: how a check-then-insert race oversells slots, and the layered fix — unique constraint, row and advisory locks, transactions, isolation.
Read