Case Study

Case Study: Cutting Embedding Costs to $0 with Gemini

How a startup paying $1,500/month in OpenAI embedding costs migrated to Gemini Embedding in one afternoon — without re-embedding a single document.

This is a reconstructed case study based on a scenario that plays out repeatedly among developer teams building semantic search on top of OpenAI embeddings. The numbers are representative; the operational steps are exact.

The situation: $1,500/month on embeddings

A B2B SaaS startup had built a document search feature using OpenAI's text-embedding-3-large. With a corpus of roughly 80 million tokens embedded over 18 months — ingestion plus query-time embedding — their monthly OpenAI embedding bill had reached $1,500.

In early 2026, Google made Gemini Embedding free up to a generous daily limit — enough to cover most startups' entire embedding workload. The engineering team ran a quick test on their benchmark queries and saw comparable retrieval quality. The decision to migrate seemed obvious.

The problem: they had 12 million vectors stored in pgvector. Re-embedding the full corpus would cost $1,560 (80M tokens at text-embedding-3-large pricing) and take 8–12 hours of continuous API calls. And during the migration, their search feature would be degraded.

They found Schift. Here is what happened next.

Step 1: Sample and train (45 minutes)

The first step was pulling a representative sample from their corpus. Using Schift's SDK, they sampled 1,200 documents — a mix of short and long content, spanning their full topic distribution:

from schift import Schift
from schift.adapters import PgVectorAdapter

client = Schift(api_key="sch_...")

# Fit a projection matrix from OpenAI -> Gemini
proj = client.migrate.fit(
    source="openai/text-embedding-3-large",
    target="google/gemini-embedding-004",
    sample_ratio=0.0001,  # 1,200 docs from 12M
    db="postgresql://..."
)
print(proj["id"])  # proj_9f3a2c...

The fit() call:

  • Pulled 1,200 existing source vectors from pgvector (no API calls)
  • Fetched the corresponding document texts from their metadata store
  • Embedded those 1,200 texts with Gemini embedding-004 (free, under their daily limit)
  • Trained the projection matrix using Schift's learned projection algorithm

Total time: 43 minutes. Total cost: $0 (Gemini training embeddings were within the free tier).

Step 2: Benchmark before committing (8 minutes)

Before migrating 12 million vectors, they ran a quality check:

report = client.bench.run(
    source="openai/text-embedding-3-large",
    target="google/gemini-embedding-004",
    projection=proj["id"],
    data="./eval_queries.jsonl"  # 200 annotated queries
)

print(report.verdict)       # SAFE
print(report.recovery)      # 0.964
print(report.source_r10)    # 0.847
print(report.projected_r10) # 0.816

96.4% recovery. Their search would retain 96.4% of its original quality. For a document search feature used by B2B customers, this was comfortably within acceptable range. They greenlit the migration.

Step 3: Migrate (18 minutes)

result = client.migrate.run(
    projection=proj["id"],
    db="postgresql://...",
    table="document_embeddings",
    on_progress=lambda r: print(f"{r['progress']:.0%} — {r['vectors_done']:,} vectors")
)

# 10% — 1,200,000 vectors
# 20% — 2,400,000 vectors
# ...
# 100% — 12,000,000 vectors

print(f"Migrated {result['total_vectors']:,} vectors in {result['elapsed_seconds']}s")
# Migrated 12,000,000 vectors in 1,074s

18 minutes. No re-embedding. No API calls to OpenAI or Gemini during the bulk migration step. Schift applied the projection matrix locally — 12 million matrix multiplies at sub-millisecond each.

The team ran the migration against a staging copy of their database first, validated query results, and then ran it on production during a low-traffic period. Zero downtime. Zero support tickets.

Step 4: Switch the router (2 minutes)

With vectors migrated, they updated their Schift routing config to point at Gemini for new embeddings:

client.routing.set(
    primary="google/gemini-embedding-004",
    fallback="openai/text-embedding-3-large"
)

Their application code — which calls client.embed() — required zero changes. The router handles the model selection transparently.

The outcome

Metric Before After
Monthly embedding cost $1,500 $0
Migration cost $0
Migration time 71 minutes total
Downtime 0 minutes
Retrieval quality R@10: 0.847 R@10: 0.816 (96.4% preserved)
Code changes 0 lines

What surprised them

The team expected the migration to be a weekend project. It finished in a single afternoon. The part that surprised them most: after migration, several users reported that search "felt snappier." Gemini embedding-004 has lower API latency than text-embedding-3-large at query time — a side benefit they had not anticipated.

The 3.6% quality difference was measurable in benchmarks but invisible to end users. Search results that were slightly different were not obviously worse — just different.

When this works — and when it does not

This migration path works best when:

  • Your corpus is well-represented by a 0.01%–1% sample.
  • Your domain is general-purpose or covered by standard benchmarks.
  • 95%+ retrieval recovery is acceptable (it is for most use cases).

It works less well for highly specialized domains where general-purpose benchmarks underrepresent your content — medical literature with specialized terminology, legal documents with jurisdiction-specific language. In those cases, you should use a larger training sample and validate carefully with domain-specific evaluation queries.

But for the large majority of teams building search on top of general-purpose content? The path from $1,500/month to $0 is now one afternoon of engineering work.

Ready to try Schift?

Switch embedding models without re-embedding. Start free.

Get started free