This note records why the engine moved toward SQ8 as the default storage format for new collections, what we measured, what failed, and what we are not doing yet.

Why This Change

We had three practical questions:

Is SQ8 good enough on query quality?
Is SQ8 clearly better than SQ4 at realistic scale?
If we want more compression than SQ8, should we keep pushing SQ4 or try a separate TurboQuant-style path?

The answer from the data was straightforward:

SQ8 keeps recall loss small enough to be a sensible default
SQ4 saves more memory, but falls behind at 1M
a first-pass TurboQuant-inspired path (TQ4) was not competitive

That makes the product decision easy: use SQ8 as the default for new collections, and leave existing collections alone.

What We Measured

Environment:

Apple M5 Pro
48 GB RAM
rustc 1.94.0
local single-machine runs on 2026-03-26

Test shape:

dim = 1024
latency benchmarks: 1000 queries, top_k = 10
recall tests: brute-force top-10 ground truth
DiskCollection real path only: upsert -> flush -> compact -> preload -> search

Result 1: SQ8 Quality Is Good Enough

SQ8 vs F32 at 1024d / 10,000 vectors / 50 queries:

Metric	Value
F32 recall@10	0.9960
SQ8 recall@10	0.9800
Recall delta	0.0160

This is the number that matters most for the default decision. SQ8 is not lossless, but the drop is small enough that the latency and memory gains dominate.

Result 2: SQ8 Beats SQ4 at 1M

At 1M vectors:

Format	Build	p50(us)	p95(us)	p99(us)	QPS	MB
SQ8+HNSW	839s	277	392	502	3400	1024.0
SQ4+HNSW	781s	860	1048	1221	1138	512.0
FAISS HNSW reference	-	621	941	1653	1503	4096.0

Interpretation:

SQ8 is not just smaller than F32; it is also clearly faster at scale
SQ4 gets the 8x compression story, but at 1M its latency/QPS tradeoff stops looking like a good default

So SQ4 remains a compression option, not the default path.

Result 3: First-Pass TurboQuant Was Not Good

We also tried a separate TQ4 path rather than mutating SQ4.

The first pass used:

randomized Hadamard transform
3-bit transformed base code
1-bit residual sign path

That sounds directionally right, but the actual numbers were poor.

Recall:

Metric	Value
F32 recall@10	0.9920
TQ4 recall@10	0.4900
Recall delta	0.5020

200K latency:

Format	Build	p50(us)	p95(us)	p99(us)	QPS	MB
SQ8+HNSW	138s	211	247	281	4645	204.8
TQ4+HNSW	144s	3803	4115	4300	263	104.0

Interpretation:

the first pass only wins on storage size
it loses badly on both quality and latency
this is not production-ready and not worth using as a default or even a near-term option

Product Decision

We should not do a DB-wide migration.

Instead:

existing collections stay as they are
new collections use SQ8 by default
unsupported dimensions still fall back to F32
if a collection ever needs to be rebuilt, the raw DB values are the source of truth

That keeps the operational change small while still taking the performance win where it matters.

Engineering Change

The code change itself is small:

switch SegmentConfig::default() to StorageFormat::Sq8
keep the existing guarded fallback so unsupported dimensions still flush as F32
add regression tests proving both behaviors
update README / INTERNAL / benchmark memo so docs match policy

The hard part was not implementation. The hard part was getting enough data to trust the decision.

What We Are Not Doing

We are not claiming the current TQ4 experiment is “real TurboQuant.”

What it is:

a TurboQuant-inspired exploratory branch

What it is not:

a faithful implementation of the paper’s optimized quantizer / scorer path
a strong enough result to justify replacing SQ8

If we revisit that line of work, it should be a fresh, paper-closer implementation, not an incremental patch on top of the current first pass.

References

Google Research blog: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
ICLR 2026 paper PDF: https://openreview.net/attachment?id=tO3ASKZlok&name=pdf