Case Study
Case Study: Cutting Embedding Costs to $0 with Gemini
How a startup paying $1,500/month in OpenAI embedding costs migrated to Gemini Embedding in one afternoon — without re-embedding a single document.
This is a reconstructed case study based on a scenario that plays out repeatedly among developer teams building semantic search on top of OpenAI embeddings. The numbers are representative; the operational steps are exact.
The situation: $1,500/month on embeddings
A B2B SaaS startup had built a document search feature using OpenAI's text-embedding-3-large. With a corpus of roughly 80 million tokens embedded over 18 months — ingestion plus query-time embedding — their monthly OpenAI embedding bill had reached $1,500.
In early 2026, Google made Gemini Embedding free up to a generous daily limit — enough to cover most startups' entire embedding workload. The engineering team ran a quick test on their benchmark queries and saw comparable retrieval quality. The decision to migrate seemed obvious.
The problem: they had 12 million vectors stored in pgvector. Re-embedding the full corpus would cost $1,560 (80M tokens at text-embedding-3-large pricing) and take 8–12 hours of continuous API calls. And during the migration, their search feature would be degraded.
They found Schift. Here is what happened next.
Step 1: Sample and train (45 minutes)
The first step was pulling a representative sample from their corpus. Using Schift's SDK, they sampled 1,200 documents — a mix of short and long content, spanning their full topic distribution:
from schift import Schift
from schift.adapters import PgVectorAdapter
client = Schift(api_key="sch_...")
# Fit a projection matrix from OpenAI -> Gemini
proj = client.migrate.fit(
source="openai/text-embedding-3-large",
target="google/gemini-embedding-004",
sample_ratio=0.0001, # 1,200 docs from 12M
db="postgresql://..."
)
print(proj["id"]) # proj_9f3a2c...
The fit() call:
- Pulled 1,200 existing source vectors from pgvector (no API calls)
- Fetched the corresponding document texts from their metadata store
- Embedded those 1,200 texts with Gemini embedding-004 (free, under their daily limit)
- Trained the projection matrix using Schift's learned projection algorithm
Total time: 43 minutes. Total cost: $0 (Gemini training embeddings were within the free tier).
Step 2: Benchmark before committing (8 minutes)
Before migrating 12 million vectors, they ran a quality check:
report = client.bench.run(
source="openai/text-embedding-3-large",
target="google/gemini-embedding-004",
projection=proj["id"],
data="./eval_queries.jsonl" # 200 annotated queries
)
print(report.verdict) # SAFE
print(report.recovery) # 0.964
print(report.source_r10) # 0.847
print(report.projected_r10) # 0.816 96.4% recovery. Their search would retain 96.4% of its original quality. For a document search feature used by B2B customers, this was comfortably within acceptable range. They greenlit the migration.
Step 3: Migrate (18 minutes)
result = client.migrate.run(
projection=proj["id"],
db="postgresql://...",
table="document_embeddings",
on_progress=lambda r: print(f"{r['progress']:.0%} — {r['vectors_done']:,} vectors")
)
# 10% — 1,200,000 vectors
# 20% — 2,400,000 vectors
# ...
# 100% — 12,000,000 vectors
print(f"Migrated {result['total_vectors']:,} vectors in {result['elapsed_seconds']}s")
# Migrated 12,000,000 vectors in 1,074s 18 minutes. No re-embedding. No API calls to OpenAI or Gemini during the bulk migration step. Schift applied the projection matrix locally — 12 million matrix multiplies at sub-millisecond each.
The team ran the migration against a staging copy of their database first, validated query results, and then ran it on production during a low-traffic period. Zero downtime. Zero support tickets.
Step 4: Switch the router (2 minutes)
With vectors migrated, they updated their Schift routing config to point at Gemini for new embeddings:
client.routing.set(
primary="google/gemini-embedding-004",
fallback="openai/text-embedding-3-large"
)
Their application code — which calls client.embed() — required zero changes.
The router handles the model selection transparently.
The outcome
| Metric | Before | After |
|---|---|---|
| Monthly embedding cost | $1,500 | $0 |
| Migration cost | — | $0 |
| Migration time | — | 71 minutes total |
| Downtime | — | 0 minutes |
| Retrieval quality | R@10: 0.847 | R@10: 0.816 (96.4% preserved) |
| Code changes | — | 0 lines |
What surprised them
The team expected the migration to be a weekend project. It finished in a single afternoon. The part that surprised them most: after migration, several users reported that search "felt snappier." Gemini embedding-004 has lower API latency than text-embedding-3-large at query time — a side benefit they had not anticipated.
The 3.6% quality difference was measurable in benchmarks but invisible to end users. Search results that were slightly different were not obviously worse — just different.
When this works — and when it does not
This migration path works best when:
- Your corpus is well-represented by a 0.01%–1% sample.
- Your domain is general-purpose or covered by standard benchmarks.
- 95%+ retrieval recovery is acceptable (it is for most use cases).
It works less well for highly specialized domains where general-purpose benchmarks underrepresent your content — medical literature with specialized terminology, legal documents with jurisdiction-specific language. In those cases, you should use a larger training sample and validate carefully with domain-specific evaluation queries.
But for the large majority of teams building search on top of general-purpose content? The path from $1,500/month to $0 is now one afternoon of engineering work.