---
id: "concept-token-burning"
type: "concept"
source_timestamps: ["00:00:28", "00:05:44", "00:07:22"]
tags: ["cost-management", "inefficiency"]
related: ["concept-context-sprawl", "concept-silent-tax", "concept-smart-tokens"]
definition: "The wasteful consumption of LLM tokens through inefficient practices like raw file ingestion, long conversation histories, and bloated system prompts, leading to high costs and degraded performance."
sources: ["s45-claude-limit-chatgpt-habit"]
sourceVaultSlug: "s45-claude-limit-chatgpt-habit"
originDay: 45
---
# Token Burning

## Definition
Token burning is the wasteful consumption of LLM tokens through inefficient practices like raw file ingestion, long conversation histories, and bloated system prompts — leading to high costs and degraded performance.

## Why It Matters
Nate B. Jones identifies token burning as **the** primary reason AI bills spiral out of control — not the base price of the models themselves. As he puts it in [[quote-habits-cost-more]]: *"the models are not expensive, it's your habits that cost a lot."* With next-gen models like [[entity-claude-mythos-d45]] poised to be even more expensive (see [[claim-next-gen-expensive]]), unaddressed token burn becomes financially unsustainable.

## The Three Anti-Patterns
Token burning shows up through three recurring habits:

1. **Raw document ingestion** — dragging-and-dropping PDFs/Word/PPT into chat. The model is forced to tokenize hidden metadata (headers, footers, embedded fonts, layout coordinates) instead of just the semantic text. The fix is [[concept-markdown-conversion]].
2. **Context sprawl** — keeping a single chat alive for 20–40+ turns. Because LLMs are stateless (see [[prereq-stateless-architecture]]), the entire history is re-submitted on every turn. Detail in [[concept-context-sprawl]].
3. **The silent tax of plugin/tool bloat** — loading every available tool and a giant system prompt into context for every call. Detail in [[concept-silent-tax]].

## The Payoff
The speaker argues that stopping token burn is the highest-leverage skill in modern AI engineering. By cleaning context users can:
- Reduce costs **8–10x** (see [[claim-clean-context-cost-reduction]])
- *Improve* model reasoning, because the attention mechanism is no longer diluted by cruft (cross-referenced with the 'lost in the middle' literature noted in [[contrarian-more-context-is-worse]])
- Redirect saved budget into [[concept-smart-tokens]] — paying for actual reasoning rather than formatting noise.

## Diagnostic
Before blaming the model, run [[framework-stupid-button-audit]] (the [[concept-the-stupid-button]] checklist) on your current workflow. See also the foundational [[prereq-token-economics]].

## Core Quote
[[quote-stop-burning-tokens]]: *"If you want to use cutting edge models, you have got to stop burning tokens and blaming the model."*


## Related across days
- [[concept-token-economics]]
- [[concept-tokenizer-tax]]
- [[concept-cloud-ai-economics]]
- [[claim-cost-increase]]
- [[concept-context-sprawl]]
