Engineering

Making SQ8 the Default for New Collections

Why the engine moved to SQ8 as the default storage format — what we measured, what failed, and what we are not doing yet.

This note records why the engine moved toward SQ8 as the default storage format for new collections, what we measured, what failed, and what we are not doing yet.

Why This Change

We had three practical questions:

  1. Is SQ8 good enough on query quality?
  2. Is SQ8 clearly better than SQ4 at realistic scale?
  3. If we want more compression than SQ8, should we keep pushing SQ4 or try a separate TurboQuant-style path?

The answer from the data was straightforward:

  • SQ8 keeps recall loss small enough to be a sensible default
  • SQ4 saves more memory, but falls behind at 1M
  • a first-pass TurboQuant-inspired path (TQ4) was not competitive

That makes the product decision easy: use SQ8 as the default for new collections, and leave existing collections alone.

What We Measured

Environment:

  • Apple M5 Pro
  • 48 GB RAM
  • rustc 1.94.0
  • local single-machine runs on 2026-03-26

Test shape:

  • dim = 1024
  • latency benchmarks: 1000 queries, top_k = 10
  • recall tests: brute-force top-10 ground truth
  • DiskCollection real path only: upsert -> flush -> compact -> preload -> search

Result 1: SQ8 Quality Is Good Enough

SQ8 vs F32 at 1024d / 10,000 vectors / 50 queries:

MetricValue
F32 recall@100.9960
SQ8 recall@100.9800
Recall delta0.0160

This is the number that matters most for the default decision. SQ8 is not lossless, but the drop is small enough that the latency and memory gains dominate.

Result 2: SQ8 Beats SQ4 at 1M

At 1M vectors:

FormatBuildp50(us)p95(us)p99(us)QPSMB
SQ8+HNSW839s27739250234001024.0
SQ4+HNSW781s860104812211138512.0
FAISS HNSW reference-621941165315034096.0

Interpretation:

  • SQ8 is not just smaller than F32; it is also clearly faster at scale
  • SQ4 gets the 8x compression story, but at 1M its latency/QPS tradeoff stops looking like a good default

So SQ4 remains a compression option, not the default path.

Result 3: First-Pass TurboQuant Was Not Good

We also tried a separate TQ4 path rather than mutating SQ4.

The first pass used:

  • randomized Hadamard transform
  • 3-bit transformed base code
  • 1-bit residual sign path

That sounds directionally right, but the actual numbers were poor.

Recall:

MetricValue
F32 recall@100.9920
TQ4 recall@100.4900
Recall delta0.5020

200K latency:

FormatBuildp50(us)p95(us)p99(us)QPSMB
SQ8+HNSW138s2112472814645204.8
TQ4+HNSW144s380341154300263104.0

Interpretation:

  • the first pass only wins on storage size
  • it loses badly on both quality and latency
  • this is not production-ready and not worth using as a default or even a near-term option

Product Decision

We should not do a DB-wide migration.

Instead:

  • existing collections stay as they are
  • new collections use SQ8 by default
  • unsupported dimensions still fall back to F32
  • if a collection ever needs to be rebuilt, the raw DB values are the source of truth

That keeps the operational change small while still taking the performance win where it matters.

Engineering Change

The code change itself is small:

  • switch SegmentConfig::default() to StorageFormat::Sq8
  • keep the existing guarded fallback so unsupported dimensions still flush as F32
  • add regression tests proving both behaviors
  • update README / INTERNAL / benchmark memo so docs match policy

The hard part was not implementation. The hard part was getting enough data to trust the decision.

What We Are Not Doing

We are not claiming the current TQ4 experiment is “real TurboQuant.”

What it is:

  • a TurboQuant-inspired exploratory branch

What it is not:

  • a faithful implementation of the paper’s optimized quantizer / scorer path
  • a strong enough result to justify replacing SQ8

If we revisit that line of work, it should be a fresh, paper-closer implementation, not an incremental patch on top of the current first pass.

References

Ready to try Schift?

Switch embedding models without re-embedding. Start free.

Get started free