How We Built a $0.50-Per-Lead B2B Contact API While Apollo Charged $3

The core insight: data freshness, not data size, is the moat

Why "200M contacts indexed" is a vanity metric

Every B2B data vendor advertises a giant number — 200M, 500M, 1B records. The number is meaningless. What matters is what percentage of those records you can actually deliver, with a working email, today. Industry-standard email decay is 20–30% per year. An "indexed 5 years ago" contact has a roughly 60% chance of being dead. The size of the index is a lagging indicator; the freshness is the only thing that matters at the moment of sale.

So we built around freshness as the primary metric. Every email gets MX-validated at the moment of pull, not at the moment of ingestion. The implementation is a small async service that checks DNS MX records and TCP-connects to port 25 on the receiving mail exchanger. If the receiving server doesn't accept a banner, the row is dropped before delivery. Cost: roughly 0.4ms per check. The result is that dead domains and typo addresses are filtered before they ever hit your CSV.

The cache: how we got latency under 200ms

A boring database design beat exotic alternatives

The first prototype hit a live source on every API request. Every query took 3–8 seconds and burned API budget. The fix was a Postgres table with a `pg_trgm` GIN index on `lower(title)`, `lower(company_name)`, and `lower(city)`. With trigram-indexed substring search, the cache returns 20 leads matching "plumber in austin" in 180ms p95 against a multi-million-row table.

The not-obvious lesson: don't reach for ElasticSearch or pgvector or whatever's trendy this week if your search is fundamentally substring-matching against structured columns. Postgres + pg_trgm beats every fancier alternative we tested by 3x on latency and 10x on operational complexity.

The two-phase pipeline: cache first, source second

Letting the database absorb the easy queries

Every task in GeoLayer runs two phases. Phase 1 pulls from the cached index, filtered by the user's suppression list, MX-validated rows only. If phase 1 returns enough rows to satisfy the request, we never call the live source. If it falls short, phase 2 fans out to live public-source fetches for the remainder, persists the new rows back into the cache, and ships the combined result.

This single design choice cut live-source API spend by roughly 65% over the first 90 days as the cache built up. It also gave us the freshness-vs-cost lever: power users can request "cache-only" mode for sub-second responses, or "force-fresh" mode that bypasses the cache for niches they're working hard. Most users don't need either — the defaults handle them — but the optionality is there.

Pricing: $0.50/lead, and why that number wasn't arbitrary

The unit economics that made the number defensible

Our marginal cost per delivered lead is roughly $0.08 — live-source API cost amortized across cache hits, MX validation, Postgres compute, Netlify functions, Stripe fees. We set the retail price at $0.50 because that's where the gross margin (~84%) covers ongoing data refresh, customer support, and reinvestment into the index. We could have priced at $0.30 to undercut every competitor; we didn't, because at $0.50 the math survives a customer support hire and a year of cache growth without raising prices.

The competing intuition — set the price below your nearest competitor — leads to a race-to-the-bottom that kills the product. The better instinct is: price at a number you'd defend in a procurement call, and let the product properties (freshness, no contract, public pricing) close the deal.

The bug that cost us a week: the $15 parameter

An honest postmortem from one bad deploy

Three weeks after launch, every new task started returning the error "could not determine data type of parameter $15." Production was down for paying customers. We rolled back, then spent two days finding it.

The cause: an `INSERT … ON CONFLICT DO UPDATE` had a `CASE WHEN $15 IS NOT NULL THEN NOW() END` clause where Postgres couldn't infer $15's type from a NULL check alone. The fix was three characters: `$15::text` in both positions. The lesson, the one I'd tell anyone building anything with Postgres: when the planner says it can't determine a type, give it an explicit cast on the first use. Don't relitigate it in the second use.

What I'd do differently

Three things worth getting right at the start

One. Ship the public pricing page on day one, even if it's wrong. We didn't, and every conversation with a prospect went through 20 minutes of "so how much is this actually." The price page didn't have to be perfect; it just had to exist.

Two. Suppress aggressively from the start. We added the suppression list in week 6 and immediately discovered that 4% of repeat customers had been pulling the same dead emails monthly. A simple WHERE NOT EXISTS against a per-user suppression table got customer satisfaction scores up by a measurable amount.

Three. Write the API docs before you write the API. We did it the other way around and ended up with three endpoints when the natural shape was two. Refactoring after the docs were public cost us a deprecation period we didn't want.