---
id: "concept-prompt-caching"
type: "concept"
source_timestamps: ["00:18:00"]
tags: ["api-optimization", "cost-reduction"]
related: ["concept-silent-tax", "action-implement-caching"]
definition: "An API feature that allows developers to store and reuse static portions of a prompt (like system instructions or reference docs) across multiple calls at a deeply discounted rate."
sources: ["s45-claude-limit-chatgpt-habit"]
sourceVaultSlug: "s45-claude-limit-chatgpt-habit"
originDay: 45
---
# Prompt Caching

## Definition
Prompt caching is an API-level feature offered by frontier model providers that lets developers store and reuse large blocks of stable context across multiple calls **at a massive financial discount**.

## What Should Be Cached
In most agentic workflows a significant portion of the context window is static across calls:
- Complex system prompts
- Persona / role instructions
- Tool definitions and schemas
- Large reference documents (API docs, codebases, manuals, policy text)

Without caching, the developer pays the full input-token price to re-process this stable data on every interaction — the architectural sibling of the [[concept-silent-tax]].

## The Discount
Nate cites a **90% discount** on cached repeated tokens — e.g., $0.50 per million instead of $5.00 per million. See [[claim-caching-discount]] for validation; Anthropic's official prompt caching launched in 2024 with exactly this 90% structure (e.g., $3.75/M → $0.375/M for Sonnet's cache reads).

## Why Skipping It Is an Architectural Error
For production workloads with stable context, *not* using caching is described as a severe architectural error that needlessly drains budget. It is one of the [[framework-kiss-commands]] (Cache Stable Context) and a checkpoint in [[framework-stupid-button-audit]].

## Caveats (from enrichment overlay)
- Native prompt caching is currently available on **Anthropic and OpenAI**; Gemini and Mistral lacked native equivalents as of 2026, sometimes requiring local KV-cache hacks with 20–50% overhead.
- Cache TTLs and minimum chunk sizes vary by provider; design your stable blocks to be large and stable enough to amortize the cache write cost.

## Linked Action
[[action-implement-caching]] — enable prompt caching for static context blocks.


## Related across days
- [[concept-token-economics]]
- [[concept-token-burning]]
- [[concept-tokenizer-tax]]