Building HeatSync Part 5: The Performance Problem

Jan 28, 2026

With the core flow working and UX friction removed, HeatSync was pleasant to use. But there was a problem: every request was taking 10-15 seconds.

For a tool meant to be used in a hurry (parent at the pool, meet about to start), that’s an eternity. Users would refresh, thinking it was broken. Or give up entirely.

The culprit: every single request was uploading the PDF to OpenAI and waiting for AI processing. Even if the same PDF had been processed a hundred times before.

The Observation

Heat sheets don’t change. Once they’re published, they’re static documents. The same PDF will produce the same extraction results every time.

Two optimization opportunities jumped out:

PDF Upload Caching: Why upload the same file to OpenAI repeatedly?
Result Caching: Why re-run extraction for the same swimmer + PDF combination?

Two-Layer Caching

I implemented caching at both levels:

Cache Layer	Key	What’s Cached
PDF Cache	MD5 checksum of file	OpenAI file ID
Result Cache	PDF ID + normalized swimmer name	Full extraction result

Layer 1: PDF Cache

When a PDF is uploaded:

Calculate MD5 checksum
Check database: do we have this checksum already?
If yes → use the cached OpenAI file ID
If no → upload to OpenAI, save the file ID

OpenAI file IDs expire after ~30 days, so I track expiration and re-upload when needed.

Layer 2: Result Cache

After extraction:

Normalize the swimmer name (lowercase, handle “First Last” vs “Last, First”)
Check database: do we have results for this PDF + swimmer combination?
If yes → return cached results immediately
If no → run extraction, cache the results

The Database

I used Supabase (PostgreSQL) with Drizzle ORM for type-safe queries. Three tables:

pdf_files: Tracks uploaded PDFs by checksum, stores OpenAI file IDs with expiration dates
extraction_results: Caches extraction results per PDF + swimmer combination
result_links: Maps short codes to extraction results for the sharing feature (more on that in Part 7)

Results

The performance improvement was dramatic:

Scenario	Before	After
First extraction	10-15 seconds	10-15 seconds
Same PDF, new swimmer	10-15 seconds	~8 seconds
Same PDF, same swimmer	10-15 seconds	under 1 second

The most common case (parent checks their kid’s events, then checks again later) went from painful to instant.

The Cold Start Problem

First-time extractions still take 10-15 seconds. There’s no avoiding the AI processing time for new requests.

But here’s the key insight: at a swim meet, the first parent to use HeatSync for a heat sheet “warms the cache” for everyone else. The second parent gets fast results. The third, fourth, fifth… all instant.

For team use, this is perfect. One tech-savvy parent runs the extraction, shares the link (see Part 7), and everyone else benefits from the cached results.

Cost Reduction

Caching also reduced OpenAI costs significantly:

No redundant PDF uploads (file upload costs)
No redundant extractions (completion costs)

For a free tool, every dollar saved matters.

Auto-Migrations

One nice touch: Drizzle ORM migrations run automatically when the backend starts. No manual database setup, no “did you run the migrations?” debugging. The server self-heals.

With performance solved, one problem remained: accuracy. The AI was still occasionally returning wrong events or missing some entirely.

This is Part 5 of a series on building HeatSync. ← Part 4: Removing Friction | Part 6: Getting Accuracy Right →

Tags: