Engineering
Making SQ8 the Default for New Collections
Why the engine moved to SQ8 as the default storage format — what we measured, what failed, and what we are not doing yet.
This note records why the engine moved toward SQ8 as the default storage format for new collections, what we measured, what failed, and what we are not doing yet.
Why This Change
We had three practical questions:
- Is
SQ8good enough on query quality? - Is
SQ8clearly better thanSQ4at realistic scale? - If we want more compression than
SQ8, should we keep pushingSQ4or try a separate TurboQuant-style path?
The answer from the data was straightforward:
SQ8keeps recall loss small enough to be a sensible defaultSQ4saves more memory, but falls behind at1M- a first-pass
TurboQuant-inspiredpath (TQ4) was not competitive
That makes the product decision easy: use SQ8 as the default for new collections, and leave existing collections alone.
What We Measured
Environment:
Apple M5 Pro48 GB RAMrustc 1.94.0- local single-machine runs on
2026-03-26
Test shape:
dim = 1024- latency benchmarks:
1000queries,top_k = 10 - recall tests: brute-force top-10 ground truth
DiskCollectionreal path only:upsert -> flush -> compact -> preload -> search
Result 1: SQ8 Quality Is Good Enough
SQ8 vs F32 at 1024d / 10,000 vectors / 50 queries:
| Metric | Value |
|---|---|
| F32 recall@10 | 0.9960 |
| SQ8 recall@10 | 0.9800 |
| Recall delta | 0.0160 |
This is the number that matters most for the default decision. SQ8 is not lossless, but the drop is small enough that the latency and memory gains dominate.
Result 2: SQ8 Beats SQ4 at 1M
At 1M vectors:
| Format | Build | p50(us) | p95(us) | p99(us) | QPS | MB |
|---|---|---|---|---|---|---|
| SQ8+HNSW | 839s | 277 | 392 | 502 | 3400 | 1024.0 |
| SQ4+HNSW | 781s | 860 | 1048 | 1221 | 1138 | 512.0 |
| FAISS HNSW reference | - | 621 | 941 | 1653 | 1503 | 4096.0 |
Interpretation:
SQ8is not just smaller thanF32; it is also clearly faster at scaleSQ4gets the 8x compression story, but at1Mits latency/QPS tradeoff stops looking like a good default
So SQ4 remains a compression option, not the default path.
Result 3: First-Pass TurboQuant Was Not Good
We also tried a separate TQ4 path rather than mutating SQ4.
The first pass used:
- randomized Hadamard transform
3-bittransformed base code1-bitresidual sign path
That sounds directionally right, but the actual numbers were poor.
Recall:
| Metric | Value |
|---|---|
| F32 recall@10 | 0.9920 |
| TQ4 recall@10 | 0.4900 |
| Recall delta | 0.5020 |
200K latency:
| Format | Build | p50(us) | p95(us) | p99(us) | QPS | MB |
|---|---|---|---|---|---|---|
| SQ8+HNSW | 138s | 211 | 247 | 281 | 4645 | 204.8 |
| TQ4+HNSW | 144s | 3803 | 4115 | 4300 | 263 | 104.0 |
Interpretation:
- the first pass only wins on storage size
- it loses badly on both quality and latency
- this is not production-ready and not worth using as a default or even a near-term option
Product Decision
We should not do a DB-wide migration.
Instead:
- existing collections stay as they are
- new collections use
SQ8by default - unsupported dimensions still fall back to
F32 - if a collection ever needs to be rebuilt, the raw DB values are the source of truth
That keeps the operational change small while still taking the performance win where it matters.
Engineering Change
The code change itself is small:
- switch
SegmentConfig::default()toStorageFormat::Sq8 - keep the existing guarded fallback so unsupported dimensions still flush as
F32 - add regression tests proving both behaviors
- update README / INTERNAL / benchmark memo so docs match policy
The hard part was not implementation. The hard part was getting enough data to trust the decision.
What We Are Not Doing
We are not claiming the current TQ4 experiment is “real TurboQuant.”
What it is:
- a TurboQuant-inspired exploratory branch
What it is not:
- a faithful implementation of the paper’s optimized quantizer / scorer path
- a strong enough result to justify replacing
SQ8
If we revisit that line of work, it should be a fresh, paper-closer implementation, not an incremental patch on top of the current first pass.
References
- Google Research blog: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
- ICLR 2026 paper PDF: https://openreview.net/attachment?id=tO3ASKZlok&name=pdf