Building HeatSync Part 5: The Performance Problem

Jan 28, 2026

With the core flow working and UX friction removed, HeatSync was pleasant to use. But there was a problem: every request was taking 10-15 seconds.

For a tool meant to be used in a hurry (parent at the pool, meet about to start), that’s an eternity. Users would refresh, thinking it was broken. Or give up entirely.

The culprit: every single request was uploading the PDF to OpenAI and waiting for AI processing. Even if the same PDF had been processed a hundred times before.

The Observation

Heat sheets don’t change. Once they’re published, they’re static documents. The same PDF will produce the same extraction results every time.

Two optimization opportunities jumped out:

  1. PDF Upload Caching: Why upload the same file to OpenAI repeatedly?
  2. Result Caching: Why re-run extraction for the same swimmer + PDF combination?

Two-Layer Caching

I implemented caching at both levels:

Cache LayerKeyWhat’s Cached
PDF CacheMD5 checksum of fileOpenAI file ID
Result CachePDF ID + normalized swimmer nameFull extraction result

Layer 1: PDF Cache

When a PDF is uploaded:

  1. Calculate MD5 checksum
  2. Check database: do we have this checksum already?
  3. If yes → use the cached OpenAI file ID
  4. If no → upload to OpenAI, save the file ID

OpenAI file IDs expire after ~30 days, so I track expiration and re-upload when needed.

Layer 2: Result Cache

After extraction:

  1. Normalize the swimmer name (lowercase, handle “First Last” vs “Last, First”)
  2. Check database: do we have results for this PDF + swimmer combination?
  3. If yes → return cached results immediately
  4. If no → run extraction, cache the results

The Database

I used Supabase (PostgreSQL) with Drizzle ORM for type-safe queries. Three tables:

  • pdf_files: Tracks uploaded PDFs by checksum, stores OpenAI file IDs with expiration dates
  • extraction_results: Caches extraction results per PDF + swimmer combination
  • result_links: Maps short codes to extraction results for the sharing feature (more on that in Part 7)

Results

The performance improvement was dramatic:

ScenarioBeforeAfter
First extraction10-15 seconds10-15 seconds
Same PDF, new swimmer10-15 seconds~8 seconds
Same PDF, same swimmer10-15 secondsunder 1 second

The most common case (parent checks their kid’s events, then checks again later) went from painful to instant.

The Cold Start Problem

First-time extractions still take 10-15 seconds. There’s no avoiding the AI processing time for new requests.

But here’s the key insight: at a swim meet, the first parent to use HeatSync for a heat sheet “warms the cache” for everyone else. The second parent gets fast results. The third, fourth, fifth… all instant.

For team use, this is perfect. One tech-savvy parent runs the extraction, shares the link (see Part 7), and everyone else benefits from the cached results.

Cost Reduction

Caching also reduced OpenAI costs significantly:

  • No redundant PDF uploads (file upload costs)
  • No redundant extractions (completion costs)

For a free tool, every dollar saved matters.

Auto-Migrations

One nice touch: Drizzle ORM migrations run automatically when the backend starts. No manual database setup, no “did you run the migrations?” debugging. The server self-heals.


With performance solved, one problem remained: accuracy. The AI was still occasionally returning wrong events or missing some entirely.


This is Part 5 of a series on building HeatSync. ← Part 4: Removing Friction | Part 6: Getting Accuracy Right →

Tags: