---
id: "prereq-token-economics"
type: "prereq"
source_timestamps: ["00:00:28"]
tags: ["foundational-knowledge"]
related: ["concept-token-burning"]
reason: "Required to understand why large documents and long chats cost money."
sources: ["s45-claude-limit-chatgpt-habit"]
sourceVaultSlug: "s45-claude-limit-chatgpt-habit"
originDay: 45
---
# Understanding LLM Token Economics

## What You Need to Know
The entire thesis relies on the user understanding that:
- LLM APIs and usage limits are billed based on **tokens** (sub-word fragments — usually ~3–4 characters)
- **Input tokens** (what you send) and **output tokens** (what the model generates) are priced separately, with output typically 3–5x more expensive
- Cached input tokens (when supported) are typically discounted ~90% — see [[concept-prompt-caching]]
- Token counts are non-linear in raw bytes — formatted PDFs, tool schemas, and image content can tokenize at very different rates than plain text

## Why It's a Prerequisite
Without this base, claims like [[claim-pdf-markdown-savings]] ("100K tokens → 5K tokens") and [[claim-clean-context-cost-reduction]] ("8–10x cost reduction") are unintelligible. The mental model of *tokens-as-billable-unit* is what makes [[concept-token-burning]] visible at all.

## Quick Mental Model
Think of the context window as a billed bucket where every sentence, document chunk, tool schema, and prior turn occupies space — and you pay for the bucket's contents on every API call.
