What did Anthropic change about Claude Code's prompt cache?

Anthropic silently reduced the prompt cache TTL (time-to-live) for Claude Code from 1 hour to 5 minutes in April 2026. This means developers with long system prompts now see their cache expire 12x more frequently, paying full input token prices on cache misses that previously would have been cache hits.

How does the Claude Code cache TTL change affect developer costs?

For developers with system prompts over 10,000 tokens, the 5-minute TTL means any workflow that doesn't call the API at least every 5 minutes loses its cache and pays full input pricing. On high-volume production pipelines with intermittent traffic, this can 3–10x the effective token cost versus the 1-hour TTL.

How can developers work around the Claude Code 5-minute cache TTL?

Workarounds include: (1) split system prompts into smaller segments to reduce cache miss cost, (2) implement a 'keepalive' ping to the API every 4 minutes to refresh the cache, (3) move long context to a retrieval layer that only injects relevant chunks, (4) switch to a provider with longer cache TTLs for batch workloads.

Breaking News

April 14, 2026 · 7 min read

Anthropic Quietly Cut Claude Code's Cache TTL from 1 Hour to 5 Minutes — Developers Are Furious

TL;DR

Anthropic silently reduced Claude Code prompt cache TTL from 1 hour → 5 minutes in April 2026
Cache misses now happen 12x more frequently — developers with long system prompts pay full input pricing
No changelog, no announcement, no advance notice — discovered via billing anomalies
Workarounds: keepalive pings, prompt splitting, retrieval layers, or switching to providers with longer TTLs
Happycapy routes to multiple model providers — reducing exposure to single-vendor policy changes

On April 14, 2026, developers using Anthropic's Claude Code API began reporting a sudden spike in token usage with no corresponding increase in requests. The cause: Anthropic had silently reduced the prompt cache TTL — the window during which a cached system prompt stays valid — from 1 hour to 5 minutes. No changelog entry. No email. No advance notice. Developers found out through their billing dashboards.

What Prompt Caching Does and Why TTL Matters

Anthropic's prompt caching lets developers store long system prompts on Anthropic's infrastructure so they don't have to re-send — and re-pay for — the same thousands of tokens on every API call. The cache is keyed to a hash of the prompt content and expires after a TTL window.

Under the old 1-hour TTL, a developer with a 50,000-token system prompt could make API calls for up to 60 minutes before the cache expired. Under the new 5-minute TTL, the cache expires after 300 seconds — forcing a full cache refresh at full input token pricing.

Scenario	Old TTL (1 hr)	New TTL (5 min)
Continuous pipeline, call every 2 min	1 cache miss/hr	1 cache miss/5 min = 12/hr
Batch job, runs hourly	1 cache miss/hr	1 miss per batch run (unchanged)
User-facing app with bursty traffic	1 warm-up per session	Cold start every 5 min of inactivity
Overnight automation (runs 2am–4am)	1–2 cache misses total	1 cache miss per run if gap >5 min

The Real Cost Impact

For a developer with a 50,000-token system prompt making API calls every 6 minutes (just outside the 5-minute TTL window), the math is brutal:

Cache hit price: ~$0.003 per 1,000 input tokens (cached read rate)
Cache miss price: ~$0.015 per 1,000 input tokens (full input rate)
50,000 tokens × 10 cache misses/hour = 500,000 input tokens/hour billed at full rate
vs. 50,000 tokens × 1 cache miss/hour under old TTL = 50,000 tokens at full rate
10x cost increase on system prompt tokens for this single change

Why Anthropic Made the Change (Likely)

Anthropic has not issued a public statement. Based on community analysis, the most likely explanations are:

Infrastructure capacity — With Claude Code adoption surging, holding millions of large prompt caches for 60 minutes requires significant memory footprint. A 5-minute TTL reduces memory pressure by approximately 12x for the same number of users.
Revenue optimization — Cache hits are priced at a significant discount to full input tokens. Reducing TTL converts cache hits into higher-margin full-price token reads.
Model version migration — Claude Code has been transitioning users across model versions; shorter TTLs make cache invalidation across model updates more manageable.

4 Workarounds Developers Are Using Right Now

1. Keepalive ping every 4 minutes

Send a minimal API call with the full system prompt every 4 minutes (just inside the TTL window) during active pipeline windows. The ping itself costs only a few input tokens for the message body — far cheaper than a full cache miss on a 50K-token system prompt.

2. Split system prompts by refresh rate

Separate your system prompt into a small high-frequency section (role, format, tone) and a large low-frequency section (reference data, knowledge base). Cache the small section — cheaper if it misses. Move the large section to a retrieval layer that only injects relevant chunks per request.

3. Retrieval-augmented architecture

Replace monolithic system prompts with a RAG layer. Rather than loading all context into the system prompt, retrieve the 3–5 most relevant chunks per query. This reduces both the base token cost and the cache miss penalty to nearly nothing.

4. Provider diversification for batch workloads

For intermittent batch jobs where cache keepalive is impractical, routing to a provider with more predictable caching behavior (or no cache TTL sensitivity) for those workloads reduces exposure to single-vendor policy changes.

Don't let one provider's policy change blow up your costs.

Happycapy routes tasks across Claude, GPT-5, and Gemini — so you're never fully exposed to a single vendor's cache policy, pricing change, or outage.

Try Happycapy Free

Sources

Anthropic Anthropic Claude Google Gemini

← Back to all articles

Anthropic Quietly Cut Claude Code's Cache TTL from 1 Hour to 5 Minutes — Developers Are Furious

What Prompt Caching Does and Why TTL Matters

The Real Cost Impact

Why Anthropic Made the Change (Likely)

4 Workarounds Developers Are Using Right Now

You might also like