---
id: "prereq-context-window-mechanics"
type: "prereq"
source_timestamps: ["00:14:02", "00:14:25"]
tags: ["llm-mechanics", "context-management"]
related: ["concept-anchored-iterative-summarization"]
reason: "Required to understand why Anchored Iterative Summarization is necessary for long-running agent sessions."
sources: ["s41-nvidia-open-sourced"]
sourceVaultSlug: "s41-nvidia-open-sourced"
originDay: 41
---
# Understanding of LLM Context Windows

## What You Need to Know

- **Token limits** — every LLM has a hard maximum context window (e.g., 200k tokens for Claude, varies for GPT models).
- **Cost-of-context** — token cost grows linearly; latency often grows superlinearly with context length.
- **Position bias** — see *Lost in the Middle* (Liu et al., 2023): information in the middle of long contexts is recalled less reliably than information at the start or end.
- **Truncation behavior** — naive truncation drops state silently; the agent may lose original intent without any error signal.
- **Summarization degradation** — repeated summarization (the "telephone game") loses fidelity over multiple cycles.

## Why It Matters Here

Understanding these mechanics is the prerequisite for grasping why [[concept-anchored-iterative-summarization]] is necessary in long-running agent sessions, and why both [[entity-openai-d41]]'s and [[entity-anthropic-d41]]'s native methods fail in the characteristic ways described in [[claim-factory-compression-superiority]].

## Adjacent Reading

- *Lost in the Middle* (Liu et al., 2023)
- RAGAS framework (https://github.com/explodinggradients/ragas) — faithfulness metrics

## See Also

- [[concept-anchored-iterative-summarization]]
- [[action-compress-context-iteratively]]
- [[claim-factory-compression-superiority]]
