Engineering
pgvector Is Not a Vector Database (And That's Fine)
pgvector is a solid choice for adding vector search to Postgres at low scale. But when does it stop being enough? We ran the numbers.
There is a category of GitHub comment we have seen dozens of times: “Why not just use pgvector?” It is a fair question. Postgres is already running in most stacks. pgvector adds vector search without a new service to operate. The transactional consistency is free.
The answer is not “pgvector is bad.” The answer is “it depends on scale, and the scale boundary is lower than most people expect.” We have the benchmarks to show exactly where that boundary sits.
What pgvector actually is
pgvector is a Postgres extension. It adds a vector column type, an HNSW index (since 0.5.0), and IVFFlat as an alternative. Queries run inside the Postgres process. Results can be joined with your relational tables in a single query. You get full transactional semantics.
That is genuinely useful. For applications already on Postgres, adding vector search without a second service, a second backup policy, a second monitoring target, or a second ops runbook is a meaningful operational win. We use pgvector ourselves — for metadata storage and low-scale retrieval features.
What pgvector is not is a purpose-built vector store. Its HNSW index lives in shared memory. It does not have quantization that reduces memory footprint while maintaining search quality. It shares resources with every other query hitting your Postgres instance. At low scale these constraints do not matter. At high scale they compound.
The actual benchmark numbers
We built and operate a dedicated vector engine (written in Rust) for Schift Cloud. As part of that work we benchmarked against the two most common reference points: FAISS HNSW and Qdrant. Based on public pgvector benchmarks and our own measurements, the gap between pgvector and dedicated engines on local workloads is 225x to 751x in query latency at scale.
That is not a typo. Here are our own internal numbers at 1M vectors with dim=1024, top-10, single-thread, HNSW M=32, efSearch=50 on Apple M5 Pro:
| Engine | p50 (us) | p95 (us) | p99 (us) | QPS | Memory |
|---|---|---|---|---|---|
| Schift Engine (SQ8+HNSW) | 277 | 392 | 502 | 3,400 | 1,024 MB |
| FAISS HNSW (in-memory reference) | 621 | 941 | 1,653 | 1,503 | 4,096 MB |
| Qdrant (local, from our bench) | ~2,400 | — | — | ~350 | varies |
| pgvector HNSW (reported, 1M scale) | ~65,000+ | — | — | ~15 | ~4,096 MB |
The pgvector number comes from published benchmarks at comparable scale and conditions — it is not a number we measured directly in this test run, but it is consistent with what others have reported. The gap between pgvector and FAISS at 1M vectors is approximately 100x in p50 latency and roughly 100x in QPS. Against our engine, it is larger still.
Qdrant was 8.7x to 9.8x slower than our engine on local workloads. That is a meaningful gap, but it is not surprising — Qdrant is a full service with its own HTTP overhead and general-purpose storage layer. FAISS is the fairest apples-to-apples HNSW comparison, and our SQ8 implementation is 2.3x faster at p50 with 3.3x tighter tail latency at 4x less memory.
Why dedicated engines are faster at scale
The performance gap is not magic. It comes from a few concrete engineering decisions that a general-purpose database extension cannot easily make.
Quantization. Our engine uses SQ8 by default: 8-bit scalar quantization that reduces memory from 4096 MB to 1024 MB at 1M vectors (F32 baseline would be the same as FAISS). Recall@10 drops by only 1.6 percentage points versus F32 — from 0.9960 to 0.9800. pgvector stores raw F32 vectors. At 1M vectors with dim=1024, that is 4 GB in shared memory, competing with your relational workload.
Memory-mapped storage. Our segments are memory-mapped files with separate compaction and preload phases. The index is purpose-built for vector access patterns. There is no row storage format, MVCC overhead, or WAL log that is designed for general-purpose OLTP semantics.
SIMD scoring. When everything from storage layout to scoring is designed for one workload, you can apply vectorized instructions aggressively. A Postgres extension running inside a shared-nothing query executor has less room to do this.
What pgvector is genuinely good at
We do not want to oversell the gap, because the operational benefits of pgvector are real.
- Zero additional infrastructure. No new service to deploy, monitor, back up, or scale.
- Transactional consistency. Vector inserts and metadata updates are atomic. No eventual consistency edge cases.
- Joins. You can filter by user ID, document status, or any relational predicate in a single query without a separate fetch step.
- Existing expertise. Your team already knows Postgres. There is no new query language, no new operational model, no new failure modes to learn.
- pg_embedding ecosystem. psycopg2, SQLAlchemy, Prisma, Drizzle — all work without changes.
Those advantages are worth real money at small scale. An ops team that can avoid running a second stateful service is a faster, simpler ops team.
The real question: when does pgvector stop being enough?
The “pgvector vs dedicated engine” framing is wrong. The right question is: at what point do the performance constraints of pgvector affect your product?
| Scale | QPS Requirement | Latency Requirement | Verdict |
|---|---|---|---|
| Under 100K vectors | Single-digit QPS | 100ms acceptable | pgvector is the right call |
| 100K-500K vectors | Under 50 QPS | Sub-100ms acceptable | pgvector works; watch memory growth |
| 100K-500K vectors | Over 100 QPS or sub-10ms required | Sub-10ms required | Dedicated engine starts to win |
| Over 500K vectors | Any production QPS | Sub-ms required | Dedicated engine is the answer |
| Any scale | Any QPS | Frequent joins to relational data required | pgvector is worth the cost in ops simplicity |
The middle band — 100K to 500K vectors — is where the decision actually lives for most teams. Below that, pgvector is clearly the right call. Above 500K at meaningful QPS, the performance gap is too wide to ignore. In the middle, it is a genuine tradeoff between ops simplicity and search performance.
“Ops budget vs performance needs” is not a cop-out answer. A two-person engineering team with 200K vectors and a 20 QPS search feature should run pgvector. A team with 800K customer-facing vectors needing sub-100ms search at 500 QPS should not.
How we handle this at Schift
We use both. That is not a diplomatic non-answer — it is the actual architecture.
Postgres (with pgvector) stores all metadata: collection configs, document records, user data, billing state, organizational hierarchy. We use pgvector for low-scale internal retrieval features where transactional consistency with the surrounding metadata matters more than search latency.
Our Rust engine handles all customer-facing vector search. It runs SQ8+HNSW, memory-maps segments to disk, and serves search at 277us p50 with 3400 QPS at 1M vectors. It knows nothing about user accounts or billing — it only answers “given this query vector, what are the top-k nearest neighbors in this collection?”
The boundary between them is clean: if a query needs to join vector results with relational data, we do the vector search in the engine and the join in Postgres. We never ask pgvector to do the heavy vector work.
A note on the 225x number
We cited 225x to 751x slower for pgvector versus dedicated engines at scale. To be precise: that range reflects different benchmark conditions, different hardware, and different pgvector versions from published third-party benchmarks. We are not claiming we measured that number directly in a single controlled test.
What we did measure directly is the Schift engine vs FAISS at 1M vectors: 2.3x faster p50, 3.3x tighter p99, 4x less memory. Those numbers come from our own bench suite running against real index data on a single machine. The FAISS result is the most honest apples-to-apples HNSW comparison, since it removes the service overhead of Qdrant and the Postgres overhead of pgvector.
If pgvector at 1M vectors is approximately 100x slower than FAISS in latency (consistent with published benchmarks), and we are 2.3x faster than FAISS, then pgvector is roughly 225x slower than our engine. That is the math, not a marketing claim. Run your own benchmark with your actual data distribution and hardware before making an infrastructure decision.
What is next
We are working on a self-hosted Terraform module for the Schift execution pipeline — for teams that need dedicated search performance but want to keep everything inside their own cloud account. The engine, quantization, and HNSW implementation are the same as what runs in Schift Cloud.
We are also still running pgvector for metadata storage. That will not change. The goal is not to replace Postgres. The goal is to do each job with the right tool.