---
id: "claim-token-efficiency"
type: "claim"
confidence: "medium"
testable: true
source_timestamps: ["§3.2", "Figure 3"]
tags: ["efficiency", "tokens", "context-window"]
related: ["concept-five-layer-hierarchy", "concept-context-scoping", "concept-icm"]
sources: ["paper"]
sourceVaultSlug: "icm-paper-folder-architecture-2026Jun02"
originDay: 2
---
# Scoped Stages Use ~2–8k Tokens vs ~42k Monolithic

## The claim

Representative token counts from the script-to-animation workspace:

| Stage | Focused tokens |
|---|---|
| `01_research` | ~4.9k |
| `02_script` | ~5.5k |
| `03_production` | ~5.6k |

A **monolithic approach** loading all stages' instructions, all reference material, and all prior outputs produces a context window of **~42k tokens**, most of it irrelevant to the current task — the "unused/irrelevant" band dwarfs the useful payload.

## The mechanism

The efficiency argument is:

1. **Relevance density**: scoped stages keep almost all tokens task-relevant — see [[concept-context-scoping]].
2. **Avoiding "lost in the middle"** (Liu et al.) degradation — see [[prereq-llm-context-windows]].

Not raw compression.

## Confidence and caveats

The paper is **explicit** that these are *representative counts*, not a measured benchmark, and that **no controlled comparison to monolithic prompting was run**.

The enrichment overlay confirms: the rationale is well-grounded in the literature, but the precise token counts and implied performance gains remain anecdotal and unvalidated. The corresponding open question is [[question-controlled-comparison]].

## Mechanism source

The ability to scope context this tightly comes from the [[concept-five-layer-hierarchy]] partitioning structure into routing (~1.5k) vs. content layers.


## Related across days
- [[claim-icm-superiority]]
- [[prereq-llm-context-windows]]
- [[prereq-llm-context]]
- [[arc-evidence-base-evolution]]