---
type: "synthesis"
tags: ["arc", "evidence", "epistemics"]
spans: ["video", "paper"]
id: "arc-evidence-base-evolution"
sources: ["cross-day"]
---
# Arc: How the Evidence Base Evolves

A reader moving from the talk to the paper sees the **same factual base** acquire **scaffolding**.

## Video evidence (largely anecdotal)

- "20–40% token reduction" — no methodology given.
- "Highest-ROI move" — consultant heuristic ([[claim-l2-roi]]).
- "Karpathy does this too" — [[entity-andrej-karpathy-d1]] cited as social proof.
- Voice demo — single live demonstration.

## Paper evidence (anecdotal + lineage + limitations)

The paper retains the anecdotal grounding **and** adds:

- **Theoretical lineage**: Unix pipelines ([[prereq-unix-pipelines]]), multi-pass compilation ([[concept-icm-as-compilation]]), Karpathy context engineering ([[entity-andrej-karpathy-d2]]), Rudin inherent interpretability ([[concept-observability-side-effect]]), Amershi mixed-initiative.
- **Quantitative grounding**: 2–8k vs 42k tokens, justified via Liu et al. "lost in the middle" ([[prereq-llm-context-windows]]). See [[claim-token-efficiency]].
- **Empirical signal**: n=33 practitioners, 30 reporting U-shape ([[claim-ushaped-intervention]]).
- **External adoption**: Edinburgh Neuropolitics Lab, ICR Research, Academy of International Affairs Bonn ([[claim-external-adoption]], [[entity-external-adopters]]).
- **Explicit limits**: no controlled comparison ([[question-controlled-comparison]]), single model family ([[question-cross-model]]), observability ≠ traceability ([[question-semantic-debugging]]).

## What the upgrade demonstrates

The arc is the **maturation of a consultant heuristic into a research program**. The claims that survive: ICM's architectural shape and its observability story. The claims that need testing: the token efficiency numbers and the productivity gains. The claims explicitly out of scope: real-time voice ([[tension-voice-future-vs-paper-non-support]]).

For downstream questions, **always cite which evidence layer supports the claim** — lineage, representative numbers, self-reported practitioner data, or external adoption — rather than asserting a single confidence level.

See [[arc-talk-vs-paper-altitude]], [[open-arc-what-remains]].