prompt-cachingai-agentsengineering-disciplineproduction

GitHub Watches One Number on Its Claude Agents: 94% Cache Hit Rate

GitHub's chief product officer says prompt cache hit rate is the foundational metric for any team at scale — and a drop to 70% means a bug, not a slow day. The interesting part is what they monitor isn't the model.

NeuroX AI · June 25, 2026

At Code with Claude 2026, GitHub's chief product officer Mario Rodriguez named the single metric his team watches on a platform sending billions of messages to Claude: cache hit rate, targeted above 94% — with a drop to 70% treated as a bug in prompt assembly, not a bad day. Not latency, not model score. Cache discipline.

The logic is high-frequency-trading math. At GitHub's volume, Rodriguez told the room, a 1% swing in cache efficiency is millions of dollars. Prompt caching only pays out when the prefix of every request stays byte-identical — same system prompt, same tool definitions, same ordering. The moment something reorders a tool list or injects a timestamp high in the context, the cache misses and you pay full freight on tokens you've already sent a thousand times.

That's why a hit-rate dip reads as a code bug. It usually means prompt assembly changed somewhere upstream — and nobody noticed, because the agent still works. It just quietly costs 4x more.

This is the whole prototype-to-production gap in one number. A demo never measures its cache hit rate. A production agent treats it as an SLO — because the difference between an agent that's affordable at scale and one that isn't is rarely the model. It's the plumbing around it.

See how we build agents that ship →

Working on something similar?