# Full Vault — Agent Primer — Interpretable Context Methodology (Unified Corpus)

> **Single-fetch comprehensive vault.** Contains the agent primer + map-of-content + glossary + speakers + every note inline. Use this file for agents that cannot follow embedded links (e.g., URL-provenance-restricted fetchers). For agents that can follow links, prefer `_AGENT_PRIMER.md` for progressive disclosure with on-demand drill-down.

> *All wikilinks resolve to within-document anchors (e.g. `[concept-foo](#concept-foo)`). The vault contains 77 notes total.*

---

## Agent Primer

> **Read this in full before consulting individual notes.** This vault unifies two sources on the same methodology: a 26-minute YouTube talk by Jake Van Clief (with K. Kumar and David McDermott) and the formal arXiv paper by Van Clief & McDermott. They argue the same thesis at different altitudes. Your job as a downstream agent is to hold both faithfully and know which to lean on for which question.

## 1. What this corpus is

Two artifacts. One methodology. Different registers.

- **The talk** — "Interpretible Context Methodology & The Future of AI Dialogue" — practitioner conviction mode. Sharp rhetoric ("absurdities"), bold ROI claims, a forward-looking voice-driven AI finale. Use it for **conviction, vision, and on-stage examples**. See [entity-jake-van-clief](#entity-jake-van-clief), [entity-k-kumar](#entity-k-kumar), [entity-david-mcdermott](#entity-david-mcdermott).
- **The paper** — *Interpretable Context Methodology: Folder Structure as Agent Architecture* (arXiv:2603.16021v2, Eduba / University of Edinburgh). Researcher-honesty mode. Bounded claims, theoretical lineage, explicit limitations. Use it for **architecture, mechanism, evidence, and scope**. See [entity-icm-paper](#entity-icm-paper) and [entity-icm-paper-arxiv](#entity-icm-paper-arxiv).

The cross-day arc [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude) is the single most important orientation: the talk sells, the paper survives review. When the two diverge in altitude, default to the paper's altitude in your answers; cite the talk when conviction or vivid framing is needed.

## 2. The unified thesis in one paragraph

Interpretable Context Methodology (ICM) is a contrarian architecture for AI agents that **replaces framework-level orchestration with filesystem structure**. For **sequential workflows where a human reviews output at each step**, multi-agent frameworks ([entity-langchain](#entity-langchain), [entity-autogen](#entity-autogen), [entity-semantic-kernel](#entity-semantic-kernel)) impose engineering overhead the problem does not require. A single orchestrator ([entity-claude](#entity-claude) / [entity-claude-code](#entity-claude-code)) given access to a well-structured folder hierarchy of markdown files can navigate context, understand constraints, and execute complex tasks **deterministically** — without orchestration glue. The folder structure alone controls what context the model sees at each stage; human review gates sit between stages. The system is **observable by construction** because every artifact is a plain text file. See [concept-icm-d1](#concept-icm-d1) (video framing), [concept-icm-d2](#concept-icm-d2) (paper framing), and the canonical quote [quote-folder-controls-context](#quote-folder-controls-context).

## 3. The thesis evolution between sources

The talk and paper present the same idea with progressive precision.

| Talk frame | Paper frame |
|---|---|
| "folders and markdown" ([concept-icm-d1](#concept-icm-d1)) | five-layer context hierarchy ([concept-five-layer-hierarchy](#concept-five-layer-hierarchy)) |
| "skill" (markdown file) | stage contract — Layer 2 `CONTEXT.md` ([concept-stage-contracts](#concept-stage-contracts)) |
| "single agent" ([claim-icm-superiority](#claim-icm-superiority)) | single orchestrator + Sonnet subagents ([entity-claude-code](#entity-claude-code), [synthesis-single-agent-clarified](#synthesis-single-agent-clarified)) |
| 20–40% token reduction (anecdotal) | ~2–8k vs ~42k representative ([claim-token-efficiency](#claim-token-efficiency)) |
| "absurdities" ([contrarian-frameworks](#contrarian-frameworks)) | "overhead for *this class*" ([contrarian-frameworks-overkill](#contrarian-frameworks-overkill)) |
| dialogue is the core theme ([concept-dialogue-structure](#concept-dialogue-structure)) | context engineering / scoping ([concept-context-scoping](#concept-context-scoping)) |
| Karpathy's LLM Wiki = social proof ([entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1)) | Karpathy's context engineering = theoretical scaffolding ([entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2)) |

The arc [arc-evidence-base-evolution](#arc-evidence-base-evolution) traces the same factual base acquiring lineage (Unix, multi-pass compilation, Rudin interpretability, Amershi mixed-initiative), quantitative grounding (Liu et al. "lost in the middle"), empirical signal (n=33 practitioners), external adoption (Edinburgh, ICR, Bonn), and most importantly **explicit limitations**.

## 4. The five most important concepts

### 4.1 The Five-Layer Context Hierarchy — [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)

The paper's central technical primitive, which the talk only gestures at.

- **L0** `CLAUDE.md` — global identity (~800 tokens). *Where am I?*
- **L1** `CONTEXT.md` (root) — workspace routing (~300 tokens). *Where do I go?*
- **L2** `CONTEXT.md` (stage) — stage contract (200–500 tokens). *What do I do?*
- **L3** `references/` — stable rules / recipe (500–2k tokens). *What rules apply?*
- **L4** `output/` — per-run working artifacts. *What am I working with?*

The structural layers (L0–L2) sum to ~1.5k tokens. The architectural distinction the talk lacks is **L3 (recipe / constraints to internalize) vs L4 (ingredients / input to transform)**. Misclassifying these is the most common failure mode — see [action-separate-l3-l4](#action-separate-l3-l4) and the synthesis [synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap).

### 4.2 ICM Workspace Architecture — [framework-icm-architecture](#framework-icm-architecture)

The canonical on-disk layout: a root `CLAUDE.md` and `CONTEXT.md`, a `_config/` and `shared/` for cross-stage references, and `stages/` containing numbered subfolders (`01_research/`, `02_script/`, `03_production/`), each carrying the **same triad**: `CONTEXT.md` + `references/` + `output/`. The action item is dual — it is both the **human control surface** and the **agent's orchestration specification**. "Add or remove a stage" is a filesystem operation, not a code change ([action-numbered-stage-folders](#action-numbered-stage-folders)).

### 4.3 Stage Contracts and Review Gates — [concept-stage-contracts](#concept-stage-contracts)

Each stage's L2 `CONTEXT.md` declares: inputs (what to read from the previous stage's `output/`), processing (the role to play), and outputs (what to write to its own `output/`). Between stages sits a **human review gate** ([action-review-gates](#action-review-gates)) where the output is editable before the next stage reads it. The same model runs every stage; only the folder differs. The multi-agent behaviour is an **illusion produced entirely by folder scoping plus human gates** — see [quote-folder-controls-context](#quote-folder-controls-context).

### 4.4 Dialogue as Workflow Structure — [concept-dialogue-structure](#concept-dialogue-structure)

The talk's philosophical centre. All effective AI workflows can be reverse-engineered from successful human–AI conversations. A trivial-looking request hides a multi-step decision tree (Goal → Constraints → Assumptions → Sub-goals → Execution). [entity-k-kumar](#entity-k-kumar) built a visual mapping tool that surfaces this latent structure. [quote-dialogue-theme](#quote-dialogue-theme): *"All of these skills, all of these folders and markdown files, all have one core theme: discussion and dialogue."*

In the paper this becomes [concept-context-scoping](#concept-context-scoping): same model, different available info → different task. The bridge is [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) / [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2) — see [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering).

### 4.5 Observability as a Side Effect (Glass-Box) — [concept-observability-side-effect](#concept-observability-side-effect)

The paper's most under-rated contribution. Every intermediate output is a plain text file, so the system is observable **without any added tooling** — no logging layer, no dashboard. Open a folder, read the files. [quote-glass-box](#quote-glass-box): *"It did not become transparent through the addition of an explanation layer. It was never opaque in the first place, because every artifact is a plain-text file that a human can read."* This is Rudin's inherent-interpretability position elevated from models to **workflows**. See the contrarian framing [contrarian-observability-free](#contrarian-observability-free) and the synthesis [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue).

**Important caveat**: by colloquial glass-box usage, ICM qualifies. By strict regulatory definitions (provenance, fine-grained traceability, audit trails), it provides strong observability and **partial** traceability. The gap is [question-semantic-debugging](#question-semantic-debugging).

## 5. The supporting frameworks

### Skill Creation via Dialogue Extraction — [framework-skill-creation](#framework-skill-creation)
The talk's five-step process for converting ephemeral chats into permanent skills: Goal → Constraints → Assumptions → Sub-goals → Markdown. In paper vocabulary, this produces a stage contract ([synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract)).

### The Workspace-Builder — [framework-workspace-builder](#framework-workspace-builder)
ICM's self-hosting meta-workspace: a five-stage workspace whose output is a new workspace. Stages: Discovery → Stage mapping → Scaffolding → Questionnaire design → Validation. This is the **adoption mechanism** — see [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue) and [claim-external-adoption](#claim-external-adoption).

### Three Levels of AI Use — [concept-three-levels-ai](#concept-three-levels-ai)
The talk's organizational maturity model: L1 (copy/paste) → L2 (structured prompts) → L3 (integrated workflow). The talk's signature consulting claim ([claim-l2-roi](#claim-l2-roi), [quote-l2-roi](#quote-l2-roi)) is that L1 → L2 is the highest-ROI move. The cross-day synthesis [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline) shows how L2 corresponds to populating an ICM `references/` layer and L3 corresponds to running a full staged pipeline.

### The Edit-Source Principle — [concept-edit-source-principle](#concept-edit-source-principle)
The paper's maintenance discipline. [quote-edit-source](#quote-edit-source): *"Editing the output fixes this run. Editing the source fixes every future run."* The cross-day synthesis [synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution) shows how this closes the dialogue loop started in the talk.

## 6. The contrarian moves

- **[contrarian-frameworks](#contrarian-frameworks)** (talk) — multi-agent frameworks are absurdities. Rhetorical.
- **[contrarian-frameworks-overkill](#contrarian-frameworks-overkill)** (paper) — multi-agent frameworks are overhead **for sequential, human-reviewed workflows**, but **recommended** for real-time multi-agent, high-concurrency, or heavy branching. Analytic.
- **[contrarian-observability-free](#contrarian-observability-free)** (paper) — observability shouldn't require a logging or dashboard layer; if every artifact is a plain file, it comes for free.

The tension between the talk's rhetoric and the paper's bounded scope is the corpus's sharpest internal contradiction — see [tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope) and the recurring foil [recurring-foil-frameworks](#recurring-foil-frameworks). The defensible position is the paper's: *for this class* of problem, frameworks are overhead; not *frameworks are absurd*.

## 7. The headline claims, with confidence

1. **For sequential, human-reviewed workflows, filesystem orchestration substitutes for framework orchestration without loss of capability.** Medium confidence. Architecturally sound (Unix lineage); no published head-to-head comparison ([question-controlled-comparison](#question-controlled-comparison)).
2. **The folder structure alone controls per-stage context.** High confidence — important nuance about [entity-claude-code](#entity-claude-code) (Opus + Sonnet subagents) — see [synthesis-single-agent-clarified](#synthesis-single-agent-clarified).
3. **Scoped stages use ~2–8k focused tokens vs ~42k monolithic.** Medium confidence. Representative, not benchmarked. Rationale (relevance density + "lost in the middle" — [prereq-llm-context-windows](#prereq-llm-context-windows)) is well-grounded; numbers are anecdotal ([claim-token-efficiency](#claim-token-efficiency)).
4. **Context scoping changes the task the same model performs.** High confidence. Supported by broader prompt-engineering and context-engineering literature ([concept-context-scoping](#concept-context-scoping)).
5. **ICM is observable by default.** High confidence in colloquial sense; medium against strict governance ([concept-observability-side-effect](#concept-observability-side-effect), [quote-glass-box](#quote-glass-box)).
6. **Editing the source fixes every future run.** High confidence as principle; aspirational without semantic debugging tooling ([concept-edit-source-principle](#concept-edit-source-principle), [question-semantic-debugging](#question-semantic-debugging)).
7. **A workspace is a folder; portability is infrastructure-as-code applied to AI workflows.** High for small-team/local; medium for enterprise ([concept-portability](#concept-portability)).
8. **Human editing follows a U-shape: heavy at first and last stages.** Low confidence. n=33 self-reported ([claim-ushaped-intervention](#claim-ushaped-intervention)).
9. **ICM has been adopted outside its author's group** (Edinburgh, ICR, Bonn). Medium confidence. Adoption is real; "works" is unquantified ([claim-external-adoption](#claim-external-adoption), [entity-external-adopters](#entity-external-adopters)).
10. **The L1→L2 jump is the highest-ROI move.** High source confidence, well-aligned with practitioner consensus; quantitative ROI data scarce. Treat as strong consultant heuristic ([claim-l2-roi](#claim-l2-roi)).
11. **Real-time voice-driven AI collaboration is the future of workflows.** Medium confidence. Technically plausible; **explicitly out of scope** per the paper ([claim-voice-future](#claim-voice-future), [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)).
12. **ICM does NOT address real-time multi-agent collaboration or high-concurrency systems.** High confidence — explicit paper boundary.

## 8. The big internal tensions

- **[tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope)** — talk rhetoric vs paper precision.
- **[tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)** — the voice finale lives in exactly the use case the paper excludes. Read it as a forward extension, not a contradiction.

## 9. The role of each speaker

- **[entity-jake-van-clief](#entity-jake-van-clief)** — primary author of both sources. AI consultant and originator of ICM. Talk: conviction; paper: rigour. Almost every claim and quote in the corpus is his unless otherwise marked.
- **[entity-david-mcdermott](#entity-david-mcdermott)** — co-author of the paper; named participant in the talk. Carries the academic-credibility weight on the paper side.
- **[entity-k-kumar](#entity-k-kumar)** — University of Edinburgh student; built the **visual decision-tree mapping tool** that operationalizes the dialogue thesis. Talk only. His work prefigures the workspace-builder (see [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)).
- **[entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) / [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2)** — appears in both sources, in different roles. Talk: independent validation via his "LLM Wiki" markdown approach. Paper: cited as source of "context engineering" (2025) framing.
- **[entity-anthropic](#entity-anthropic)** — maker of [entity-claude](#entity-claude) / [entity-claude-code](#entity-claude-code). The sole agent runtime used; all testing was Opus 4.6 + Sonnet 4.6 subagents.

## 10. The supporting cast (entities)

- **[entity-claude](#entity-claude)** (talk) / **[entity-claude-code](#entity-claude-code)** (paper) — the orchestrator. Distinguish single-agent (loose talk usage) from single-orchestrator + subagent delegation (precise paper usage).
- **[entity-langchain](#entity-langchain)**, **[entity-semantic-kernel](#entity-semantic-kernel)** (both sources), **[entity-autogen](#entity-autogen)** (paper only) — the framework foils. Treat them per [recurring-foil-frameworks](#recurring-foil-frameworks).
- **[entity-11labs](#entity-11labs)** (ElevenLabs) — voice cloning provider used in the talk's voice demo.
- **[entity-remotion](#entity-remotion)** — React video framework used in the paper's script-to-animation production stage.
- **[entity-external-adopters](#entity-external-adopters)** — Edinburgh Neuropolitics Lab, ICR Research, Academy of International Affairs Bonn. NDA-limited; existence credible, performance unquantified.

## 11. Prerequisites

- **[prereq-llm-context](#prereq-llm-context)** (talk) / **[prereq-llm-context-windows](#prereq-llm-context-windows)** (paper) — token economics and Liu et al.'s "lost in the middle." Required to understand the efficiency argument.
- **[prereq-markdown](#prereq-markdown)** — basic markdown literacy.
- **[prereq-unix-pipelines](#prereq-unix-pipelines)** — Unix philosophy and pipe-and-filter. Required to understand ICM's lineage.

## 12. Action items — the unified on-ramp

A reader can adopt ICM in this order, mixing talk-level and paper-level steps:

1. **[action-codify-voice](#action-codify-voice)** — write a `voice-and-tone.md` (a Layer 3 reference).
2. **[action-move-to-l2](#action-move-to-l2)** — audit team usage; build the prompt library (more L3 references).
3. **[action-implement-folders](#action-implement-folders)** — restructure into an ICM workspace ([framework-icm-architecture](#framework-icm-architecture)).
4. **[action-separate-l3-l4](#action-separate-l3-l4)** — discipline the L3 (recipe) vs L4 (ingredients) split.
5. **[action-numbered-stage-folders](#action-numbered-stage-folders)** — formalize the pipeline as numbered stage folders.
6. **[action-review-gates](#action-review-gates)** — install human review at every stage boundary.

The progression matches [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline): L1 ad-hoc → L2 references populated → L3 staged pipeline running.

## 13. The signature quotes

- **[quote-absurdities](#quote-absurdities)** — "folders and markdown files… huge results from it." Talk. The contrarian banner.
- **[quote-l2-roi](#quote-l2-roi)** — "The jump from L1 to L2 is the highest-ROI move." Talk. The consulting heuristic.
- **[quote-dialogue-theme](#quote-dialogue-theme)** — "one core theme: discussion and dialogue." Talk. The philosophical centre.
- **[quote-voice-control](#quote-voice-control)** — "sit inside of a group call and control someone else's Claude code…" Talk. The forward-looking vision.
- **[quote-folder-controls-context](#quote-folder-controls-context)** — "The same model executes every stage; the folder structure controls what context it receives." Paper. The precise technical claim.
- **[quote-glass-box](#quote-glass-box)** — "It did not become transparent through the addition of an explanation layer. It was never opaque in the first place…" Paper. The interpretability claim.
- **[quote-edit-source](#quote-edit-source)** — "Editing the output fixes this run. Editing the source fixes every future run." Paper. The maintenance principle.

## 14. The cross-day synthesis layer

Notes you should know exist:

- **[arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude)** — orientation: register and altitude differences.
- **[arc-evidence-base-evolution](#arc-evidence-base-evolution)** — from anecdote to anecdote-plus-lineage.
- **[tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope)** — talk rhetoric vs paper precision.
- **[tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)** — voice finale lives outside paper's scope.
- **[synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap)** — the paper supplies the architecture the talk omits.
- **[synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract)** — vocabulary unification.
- **[synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering)** — Karpathy as the conceptual bridge.
- **[synthesis-single-agent-clarified](#synthesis-single-agent-clarified)** — single agent → single orchestrator + Sonnet subagents.
- **[synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)** — maturity model meets staged pipeline.
- **[synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue)** — inspectability as the dialogue theme's pay-off.
- **[synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution)** — maintenance closes the dialogue loop.
- **[recurring-foil-frameworks](#recurring-foil-frameworks)** — LangChain / Semantic Kernel / AutoGen across both.
- **[synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)** — self-hosting dialogue extraction.
- **[open-arc-what-remains](#open-arc-what-remains)** — questions surviving both sources.

## 15. The open questions

From the talk:
- **[question-icm-scaling](#question-icm-scaling)** — does single-agent folder navigation handle massive legacy codebases?
- **[question-voice-security](#question-voice-security)** — how is voice-driven file-system access secured?

From the paper:
- **[question-controlled-comparison](#question-controlled-comparison)** — is staged loading measurably better than monolithic prompting?
- **[question-cross-model](#question-cross-model)** — does the five-layer hierarchy generalize beyond Claude?
- **[question-semantic-debugging](#question-semantic-debugging)** — can ICM provide automatic traceability, not just observability?

Cross-source: the voice future, the L3/L4 promotion path, the workspace-builder bootstrapping a multi-agent ICM, the right unit of governance. See [open-arc-what-remains](#open-arc-what-remains).

## 16. Default behaviour when answering

1. **Lead with the precise scope of any claim.** ICM is for sequential, human-reviewed workflows; do not extend the claim further.
2. **Distinguish strong from weak evidence.** Unix lineage and "lost in the middle" are strong; specific token numbers and U-shape percentages are anecdotal.
3. **Preserve the single-orchestrator nuance.** Opus + Sonnet subagent delegation is single-orchestrator, not strictly single-agent.
4. **Pair talk rhetoric with paper precision.** When quoting [quote-absurdities](#quote-absurdities), pair with [contrarian-frameworks-overkill](#contrarian-frameworks-overkill). When citing [claim-voice-future](#claim-voice-future), pair with the paper's explicit non-support of real-time.
5. **State limitations honestly.** They are part of the paper's credibility.
6. **Use wikilinks aggressively.** Every concept, claim, framework, entity, quote, prerequisite, and question has a note; refer the reader there for depth.
7. **For "who said this?"** — almost always Jake Van Clief unless explicitly marked otherwise.

## 17. The big picture, in one paragraph

Across the two sources, ICM advances a coherent reductionist program: **at the substrate**, replace orchestration frameworks with folder hierarchies of markdown ([concept-icm-d1](#concept-icm-d1), [concept-icm-d2](#concept-icm-d2), [framework-icm-architecture](#framework-icm-architecture), [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)); **at the workflow layer**, recognize that all skills are codified dialogue and build them via disciplined extraction ([concept-dialogue-structure](#concept-dialogue-structure), [framework-skill-creation](#framework-skill-creation)) which the paper restates as stage contracts and context scoping ([concept-stage-contracts](#concept-stage-contracts), [concept-context-scoping](#concept-context-scoping)); **at the organizational layer**, move from copy-paste to structured-prompt maturity before investing in integration ([concept-three-levels-ai](#concept-three-levels-ai), [claim-l2-roi](#claim-l2-roi)); **at the maintenance layer**, edit the source not the output ([concept-edit-source-principle](#concept-edit-source-principle)); **at the epistemic layer**, accept that observability comes free because every artifact is a file ([concept-observability-side-effect](#concept-observability-side-effect), [quote-glass-box](#quote-glass-box)); **and at the interaction layer**, evolve toward real-time voice collaboration where the AI participates in meetings ([concept-voice-collaboration](#concept-voice-collaboration)) — though the paper carefully marks that last move as future work outside its current scope. The unifying intuition is that **simple, inspectable text + a single capable orchestrator beats elaborate orchestration for sequential, human-reviewed workflows**. The broader literature endorses this as a starting posture, partially validates it for many use cases, and pushes back where the scope extends past the bounded claim. Be that nuanced answer for downstream users.


---

## Map of Content

# Map of Content — Interpretable Context Methodology Corpus

> Unified vault spanning two sources on the same methodology:
> - **Video** — *Interpretible Context Methodology & The Future of AI Dialogue* (YouTube, 26m38s)
> - **Paper** — *Interpretable Context Methodology: Folder Structure as Agent Architecture* (arXiv:2603.16021v2)

Start with [[_AGENT_PRIMER]]. Then [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude) for orientation.

---

## Cross-Day Synthesis Layer (`cross-day/`)

The view from above — only visible when both sources are read together.

### Arcs
- [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude) — two altitudes of the same thesis
- [arc-evidence-base-evolution](#arc-evidence-base-evolution) — from anecdote to anecdote-plus-lineage
- [open-arc-what-remains](#open-arc-what-remains) — open questions surviving both sources

### Tensions
- [tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope) — talk rhetoric vs paper precision
- [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support) — voice finale lives outside paper scope

### Syntheses
- [synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap) — paper supplies the architecture talk omits
- [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract) — vocabulary unification
- [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering) — Karpathy as conceptual bridge
- [synthesis-single-agent-clarified](#synthesis-single-agent-clarified) — single agent → single orchestrator + subagents
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline) — maturity model meets staged pipeline
- [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue) — inspectability as dialogue's pay-off
- [synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution) — maintenance closes dialogue loop
- [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue) — self-hosting dialogue extraction

### Recurring patterns
- [recurring-foil-frameworks](#recurring-foil-frameworks) — LangChain / Semantic Kernel / AutoGen across both

---

## Pillar — Video Source (per-day notes from talk)

### Concepts
- [concept-icm-d1](#concept-icm-d1) — Interpretible Context Methodology (talk framing)
- [concept-dialogue-structure](#concept-dialogue-structure) — Dialogue as workflow structure
- [concept-three-levels-ai](#concept-three-levels-ai) — Three levels of AI use
- [concept-voice-collaboration](#concept-voice-collaboration) — Real-time voice-driven AI collaboration

### Claims
- [claim-icm-superiority](#claim-icm-superiority) — ICM outperforms multi-agent frameworks
- [claim-l2-roi](#claim-l2-roi) — L1→L2 jump is highest-ROI move
- [claim-voice-future](#claim-voice-future) — Voice-driven AI is the future

### Frameworks
- [framework-skill-creation](#framework-skill-creation) — Skill creation via dialogue extraction

### Quotes
- [quote-absurdities](#quote-absurdities) — "folders and markdown files… huge results"
- [quote-l2-roi](#quote-l2-roi) — "highest-ROI move"
- [quote-dialogue-theme](#quote-dialogue-theme) — "one core theme: discussion and dialogue"
- [quote-voice-control](#quote-voice-control) — "sit inside of a group call and control… Claude code"

### Contrarian
- [contrarian-frameworks](#contrarian-frameworks) — Multi-agent frameworks are over-engineered

### Action items
- [action-implement-folders](#action-implement-folders) — Replace orchestration code with folders
- [action-move-to-l2](#action-move-to-l2) — Standardize prompts to reach L2
- [action-codify-voice](#action-codify-voice) — Write voice-and-tone.md

### Prerequisites
- [prereq-llm-context](#prereq-llm-context) — LLM context windows
- [prereq-markdown](#prereq-markdown) — Markdown basics

### Open questions
- [question-icm-scaling](#question-icm-scaling) — Does ICM scale to enterprise codebases?
- [question-voice-security](#question-voice-security) — How is voice-driven access secured?

---

## Pillar — Paper Source (per-day notes from arXiv)

### Concepts
- [concept-icm-d2](#concept-icm-d2) — ICM (paper framing)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) — Five-layer context hierarchy
- [concept-stage-contracts](#concept-stage-contracts) — Stage contracts and review gates
- [concept-context-scoping](#concept-context-scoping) — Context scoping changes the task
- [concept-icm-as-compilation](#concept-icm-as-compilation) — ICM as multi-pass compilation
- [concept-observability-side-effect](#concept-observability-side-effect) — Glass-box AI by construction
- [concept-edit-source-principle](#concept-edit-source-principle) — Edit source, not output
- [concept-portability](#concept-portability) — Workspace is a folder

### Claims
- [claim-token-efficiency](#claim-token-efficiency) — ~2–8k vs ~42k tokens
- [claim-ushaped-intervention](#claim-ushaped-intervention) — U-shape human editing pattern
- [claim-external-adoption](#claim-external-adoption) — ICM adopted outside author's group

### Frameworks
- [framework-icm-architecture](#framework-icm-architecture) — Canonical workspace layout
- [framework-workspace-builder](#framework-workspace-builder) — Self-hosting meta-workspace

### Quotes
- [quote-folder-controls-context](#quote-folder-controls-context) — folder structure controls context
- [quote-glass-box](#quote-glass-box) — never opaque in the first place
- [quote-edit-source](#quote-edit-source) — editing source fixes every future run

### Contrarian
- [contrarian-frameworks-overkill](#contrarian-frameworks-overkill) — Frameworks are overhead for this class
- [contrarian-observability-free](#contrarian-observability-free) — Observability shouldn't require dashboards

### Action items
- [action-numbered-stage-folders](#action-numbered-stage-folders) — Numbered stage folders
- [action-review-gates](#action-review-gates) — Human review at every stage boundary
- [action-separate-l3-l4](#action-separate-l3-l4) — Separate reference from working artifacts

### Prerequisites
- [prereq-llm-context-windows](#prereq-llm-context-windows) — Lost in the middle
- [prereq-unix-pipelines](#prereq-unix-pipelines) — Unix philosophy and pipe-and-filter

### Open questions
- [question-controlled-comparison](#question-controlled-comparison) — Is staging better than monolithic?
- [question-cross-model](#question-cross-model) — Does ICM generalize across model families?
- [question-semantic-debugging](#question-semantic-debugging) — Traceability, not just observability?

### Exhibits
- [exhibit-icm-paper-figures](#exhibit-icm-paper-figures) — Paper figures and tables

---

## Entities

See [[speakers]] for full speaker manifest.

### People
- [entity-jake-van-clief](#entity-jake-van-clief) — primary author, both sources
- [entity-david-mcdermott](#entity-david-mcdermott) — co-author, paper; participant, talk
- [entity-k-kumar](#entity-k-kumar) — built visual decision-tree mapping tool (talk only)
- [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) — LLM Wiki social proof (talk)
- [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2) — context engineering scaffold (paper)

### Source artifacts
- [entity-icm-paper](#entity-icm-paper) — the paper (paper-side identity)
- [entity-icm-paper-arxiv](#entity-icm-paper-arxiv) — the paper (talk-side companion reference)

### Tools and platforms
- [entity-anthropic](#entity-anthropic) — maker of Claude
- [entity-claude](#entity-claude) — Claude (talk framing)
- [entity-claude-code](#entity-claude-code) — Claude Code: Opus 4.6 orchestrator + Sonnet 4.6 subagents (paper)
- [entity-11labs](#entity-11labs) — ElevenLabs voice cloning
- [entity-remotion](#entity-remotion) — React video framework

### Framework foils
- [entity-langchain](#entity-langchain) — both sources
- [entity-semantic-kernel](#entity-semantic-kernel) — both sources
- [entity-autogen](#entity-autogen) — paper only

### Adopters
- [entity-external-adopters](#entity-external-adopters) — Edinburgh, ICR, Bonn

---

## Indexes
- [[_AGENT_PRIMER]] — full unified primer
- [[glossary]] — unified vocabulary
- [[speakers]] — speaker manifest


---

## Glossary

# Glossary — Interpretable Context Methodology Corpus

> Unified, deduplicated vocabulary across the talk and the paper. Alphabetical.

- **11Labs / ElevenLabs** — commercial AI speech synthesis and voice cloning provider; used in the talk's voice-driven demo. See [entity-11labs](#entity-11labs).
- **AutoGen** — Microsoft's open-source multi-agent conversation framework; framework foil cited only in the paper. See [entity-autogen](#entity-autogen).
- **CLAUDE.md** — the Layer 0 file at workspace root encoding global agent identity (~800 tokens). See [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).
- **Claude** — Anthropic's family of LLMs; the orchestrator throughout the corpus. See [entity-claude](#entity-claude).
- **Claude Code** — Anthropic's developer agent runtime using Opus 4.6 as orchestrator and Sonnet 4.6 as subagent workers; the sole runtime tested by the paper. See [entity-claude-code](#entity-claude-code).
- **Context engineering** — Karpathy's (2025) framing: system behaviour depends on what context is delivered, in what structure, at what moment. See [concept-context-scoping](#concept-context-scoping).
- **Context scoping** — ICM's performance mechanism: the model is unchanged; the *available information* per stage is what changes. See [concept-context-scoping](#concept-context-scoping).
- **CONTEXT.md** — file used at Layer 1 (workspace routing) and Layer 2 (stage contract). See [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).
- **Dialogue tree** — the latent decision structure inside a successful human–AI conversation (Goal → Constraints → Assumptions → Sub-goals → Execution). See [concept-dialogue-structure](#concept-dialogue-structure).
- **Edit-source principle** — fix the cause (source files) not the symptom (output). See [concept-edit-source-principle](#concept-edit-source-principle) and [quote-edit-source](#quote-edit-source).
- **Glass-box AI** — system interpretable by construction because every artifact is a plain text file. Colloquial usage; strict regulatory usage requires more. See [concept-observability-side-effect](#concept-observability-side-effect) and [quote-glass-box](#quote-glass-box).
- **ICM (Interpretable / Interpretible Context Methodology)** — folder + markdown substrate for AI agent context, replacing framework-level orchestration. Spelled "Interpretible" in the talk, "Interpretable" in the paper. See [concept-icm-d1](#concept-icm-d1) and [concept-icm-d2](#concept-icm-d2).
- **Inherent interpretability (Rudin)** — preference for systems interpretable by construction over post-hoc explanations of opaque ones. Cited as ICM's interpretability lineage.
- **L0 / L1 / L2 / L3 / L4** — the five context layers: global identity / workspace routing / stage contract / stable references / per-run working artifacts. See [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).
- **LangChain** — popular open-source LLM framework; canonical framework foil. See [entity-langchain](#entity-langchain).
- **Level 1 / Level 2 / Level 3 (of AI use)** — talk's maturity model: copy-paste / structured prompts / integrated workflow. See [concept-three-levels-ai](#concept-three-levels-ai).
- **LLM Wiki** — Karpathy's markdown-based personal knowledge approach, cited as ICM-adjacent. See [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1).
- **Lost in the middle (Liu et al.)** — finding that LLMs degrade on information buried in long contexts; supports ICM's scoping argument. See [prereq-llm-context-windows](#prereq-llm-context-windows).
- **Mixed-initiative (Amershi et al.)** — design guidelines for human–AI interaction; ICM's review gates map onto efficient correction and efficient dismissal.
- **Multi-pass compilation** — compiler analogy for ICM stages as content-compilation passes. See [concept-icm-as-compilation](#concept-icm-as-compilation).
- **Observability** — readable intermediate state. ICM provides this by construction; observability ≠ traceability. See [concept-observability-side-effect](#concept-observability-side-effect) and [question-semantic-debugging](#question-semantic-debugging).
- **Output (folder)** — Layer 4 per-run working artifacts inside a stage. See [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).
- **Pipe-and-filter** — software architecture pattern (Shaw & Garlan); ICM is its Unix-pipes-for-LLMs descendant. See [prereq-unix-pipelines](#prereq-unix-pipelines).
- **Portability** — the workspace is a folder; copy, zip, commit, or sync to deploy. See [concept-portability](#concept-portability).
- **References (folder)** — Layer 3 stable rules and constraints inside a stage. See [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) and [action-separate-l3-l4](#action-separate-l3-l4).
- **Remotion** — React-based framework for programmatic video; used in the paper's script-to-animation pipeline. See [entity-remotion](#entity-remotion).
- **Review gate** — the editable pause between two stages where a human inspects and edits the output before the next stage reads it. See [action-review-gates](#action-review-gates).
- **Semantic Kernel** — Microsoft's open-source orchestration framework using "skills" / "planners" / "connectors"; framework foil. See [entity-semantic-kernel](#entity-semantic-kernel).
- **Single orchestrator** — the precise paper framing replacing the talk's looser "single agent": one orchestrator (Opus 4.6) with folder-driven subagent (Sonnet 4.6) delegation. See [synthesis-single-agent-clarified](#synthesis-single-agent-clarified).
- **Skill** — talk's term for a single markdown file encoding Goal, Constraints, Assumptions, Sub-goals. In paper vocabulary this is a stage contract. See [framework-skill-creation](#framework-skill-creation) and [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract).
- **Stage** — a numbered folder representing one pipeline step in an ICM workspace. See [action-numbered-stage-folders](#action-numbered-stage-folders).
- **Stage contract** — paper's term for a stage's Layer 2 `CONTEXT.md`; declares inputs, processing, outputs. See [concept-stage-contracts](#concept-stage-contracts).
- **Traceability** — automatic mapping of output spans back to causing source snippets. ICM does not yet provide this. See [question-semantic-debugging](#question-semantic-debugging).
- **Unix philosophy** — McIlroy's "do one thing well" + plain text as universal interface; ICM's architectural ancestor. See [prereq-unix-pipelines](#prereq-unix-pipelines).
- **U-shape (intervention pattern)** — empirical claim that humans edit heavily at the first and last stages and lightly in the middle. See [claim-ushaped-intervention](#claim-ushaped-intervention).
- **Voice-driven AI collaboration** — talk's forward-looking vision of voice + LLM + local file-system loop during live meetings. Explicitly out of paper's scope. See [concept-voice-collaboration](#concept-voice-collaboration) and [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support).
- **Workspace** — an ICM project: a folder containing the canonical layout (`CLAUDE.md`, `CONTEXT.md`, `stages/`, etc.). See [framework-icm-architecture](#framework-icm-architecture).
- **Workspace-builder** — the self-hosting meta-workspace whose output is a new workspace. See [framework-workspace-builder](#framework-workspace-builder).


---

## Speakers

# Speakers — Interpretable Context Methodology Corpus

> Alphabetical. Each section: where the speaker appears, key contributions, and links to their most important attributed concepts, claims, and quotes.

---

## Andrej Karpathy

**Appears in**: video (talk), paper (cited)

Karpathy is not a speaker on the talk or a co-author of the paper, but he is cited in **both** sources in different conceptual roles. The video uses him as *social proof* (he too writes markdown rather than building frameworks); the paper uses him as *theoretical scaffolding* for context engineering.

- Entity notes: [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) (talk framing), [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2) (paper framing)
- Key bridge note: [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering)
- Attributed concepts:
  - **"LLM Wiki"** markdown-based personal knowledge approach (talk)
  - **"Context engineering"** (2025): system performance depends on what context is delivered, in what structure, at what moment (paper) — see [concept-context-scoping](#concept-context-scoping)

---

## David McDermott

**Appears in**: video (talk participant), paper (co-author)

McDermott is the *quieter* of the two named authors. On the talk he is a named participant with limited individually attributed content. On the paper he is the named co-author, sharing the formal authorship with Van Clief.

- Entity note: [entity-david-mcdermott](#entity-david-mcdermott)
- Source affiliations: [entity-icm-paper](#entity-icm-paper) (co-author), talk participant
- Role across the corpus: lends academic-credibility weight on the paper side; relatively limited individually attributed content. Treat unattributed paper claims as joint Van Clief + McDermott; treat unattributed talk claims as Van Clief unless context says otherwise.

---

## Jake Van Clief

**Appears in**: video (primary speaker), paper (first author)

Van Clief is the originator of ICM and the primary voice across both sources. The talk is his *practitioner-conviction* mode; the paper is his *researcher-honesty* mode. Almost every quote and claim in the corpus traces to him.

- Entity note: [entity-jake-van-clief](#entity-jake-van-clief)
- Source affiliations: talk (primary speaker), [entity-icm-paper](#entity-icm-paper) (first author)
- Key bridge note: [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude)
- Attributed concepts: [concept-icm-d1](#concept-icm-d1), [concept-icm-d2](#concept-icm-d2), [concept-dialogue-structure](#concept-dialogue-structure), [concept-three-levels-ai](#concept-three-levels-ai), [concept-voice-collaboration](#concept-voice-collaboration), [concept-five-layer-hierarchy](#concept-five-layer-hierarchy), [concept-stage-contracts](#concept-stage-contracts), [concept-context-scoping](#concept-context-scoping), [concept-observability-side-effect](#concept-observability-side-effect), [concept-edit-source-principle](#concept-edit-source-principle), [concept-portability](#concept-portability), [concept-icm-as-compilation](#concept-icm-as-compilation)
- Attributed frameworks: [framework-skill-creation](#framework-skill-creation), [framework-icm-architecture](#framework-icm-architecture), [framework-workspace-builder](#framework-workspace-builder)
- Attributed claims: [claim-icm-superiority](#claim-icm-superiority), [claim-l2-roi](#claim-l2-roi), [claim-voice-future](#claim-voice-future), [claim-token-efficiency](#claim-token-efficiency), [claim-ushaped-intervention](#claim-ushaped-intervention), [claim-external-adoption](#claim-external-adoption)
- Signature quotes:
  - [quote-absurdities](#quote-absurdities) — "folders and markdown files… huge results" (talk)
  - [quote-l2-roi](#quote-l2-roi) — "the jump from L1 to L2 is the highest-ROI move" (talk)
  - [quote-dialogue-theme](#quote-dialogue-theme) — "one core theme: discussion and dialogue" (talk)
  - [quote-voice-control](#quote-voice-control) — "sit inside of a group call and control… Claude code" (talk)
  - [quote-folder-controls-context](#quote-folder-controls-context) — "the folder structure controls what context it receives" (paper)
  - [quote-glass-box](#quote-glass-box) — "never opaque in the first place" (paper)
  - [quote-edit-source](#quote-edit-source) — "editing the source fixes every future run" (paper)

When asked "who said this?" — default to Van Clief unless the source explicitly attributes otherwise. The McDermott paper co-authorship means paper claims are formally joint, but the conceptual register and signature framing are Van Clief's.

---

## K. Kumar

**Appears in**: video (talk participant)

Kumar is a University of Edinburgh student and Van Clief's collaborator. He built the **visual decision-tree mapping tool** that surfaces the latent dialogue structure inside chat transcripts — the prototype that operationalizes [concept-dialogue-structure](#concept-dialogue-structure).

- Entity note: [entity-k-kumar](#entity-k-kumar)
- Source affiliation: talk only
- Key contribution: visual mapping tool for dialogue structure extraction
- Conceptual lineage: his tool is the precursor to the paper's [framework-workspace-builder](#framework-workspace-builder) — see [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)
- Institutional link: connects the corpus to the University of Edinburgh, one of the named external adopters in [entity-external-adopters](#entity-external-adopters) and [claim-external-adoption](#claim-external-adoption)


---

## All Notes

### Folder: concepts

#### concept-context-scoping

*type: `concept` · sources: paper*

## The mechanism

ICM's performance mechanism is **context scoping**. A model receiving:

- research instructions + source material + topic brief

…behaves differently from the same model receiving:

- a script template + voice guide + research summary.

The model's capabilities do not change between stages. What changes is the *information available* when it generates output.

## Lineage

This is context engineering in practice as framed by [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2): system performance depends on *what* context is delivered, *in what structure*, and *at what moment*. ICM operationalizes the framing by structuring context into separate organizational tiers — the [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) — rather than a monolithic prompt.

The Layer 3 / Layer 4 distinction adds a second dimension: signalling which context is a **constraint** and which is **material to transform**.

## Connection to "lost in the middle"

Scoped context keeps relevant tokens out of the degraded mid-context band identified by Liu et al. — see [prereq-llm-context-windows](#prereq-llm-context-windows). This is the theoretical basis for the efficiency story in [claim-token-efficiency](#claim-token-efficiency).

## Validation status

Strongly supported by broader LLM research: prompting and "role" conditioning effects are well-documented, even when the underlying model is unchanged. Tooling ecosystems like LangChain, Semantic Kernel, and [entity-autogen](#entity-autogen) all operationalize the same assumption via chains/graphs of prompts; ICM differs in *how* it encodes the graph (filesystem vs. code), not in the underlying premise.


## Related across days
- [concept-dialogue-structure](#concept-dialogue-structure)
- [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering)
- [prereq-llm-context-windows](#prereq-llm-context-windows)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)


#### concept-dialogue-structure

*type: `concept` · sources: video*

## Thesis

A central claim of the source: **all structured AI workflows, prompt libraries, and 'skills' are fundamentally derived from human dialogue and conversational decision trees.** See [quote-dialogue-theme](#quote-dialogue-theme).

## The Hidden Decision Tree

A simple chat request like *'tighten this paragraph'* hides a complex chain of decisions:

1. **Goal** — understand the primary intent
2. **Constraints** — reduce wordiness, maintain tone, keep meaning
3. **Assumptions** — target audience, expected register, length budget
4. **Sub-goals** — restructure sentences, eliminate filler, preserve rhythm
5. **Execution** — produce the revision

## The Visual Mapping Tool

[entity-k-kumar](#entity-k-kumar), a co-founder and student at the University of Edinburgh, built a visual tool used in the video to render these implicit decision trees from real chat transcripts. The tool exposes the latent goals, constraints, and assumptions that drove a successful interaction.

## From Ephemeral to Permanent

Once the tree is mapped, the four components (Goal / Constraints / Assumptions / Sub-goals) can be encoded into a markdown skill file. This transforms ephemeral chat history into a **reusable, deterministic AI skill** — exactly the artifact format used by [concept-icm-d1](#concept-icm-d1).

The process is codified as [framework-skill-creation](#framework-skill-creation).

## Where the Encoded Artifact Lives (Companion Paper)

The "permanent artifact" this note describes is, in the formal paper [entity-icm-paper-arxiv](#entity-icm-paper-arxiv), a specific tier of the **Five-Layer Context Hierarchy**: the mapped Goal/Constraints/Assumptions/Sub-goals become the **Stage `CONTEXT.md` (Layer 2)** contract and its **Layer 3 reference material**, while the live chat that seeded it is transient **Layer 4** working content. The paper's lineage for this move is Knuth's *literate programming* (instructions and context co-located in readable text) and Wei et al.'s *chain-of-thought decomposition* — i.e., the decision tree this note extracts from dialogue is the same structure the paper persists as a stage contract. K. Kumar's Edinburgh affiliation also lines up with the paper's Eduba / University of Edinburgh base.

## Counter-Perspective

The descriptive claim ('skills can be reverse-engineered from conversations') is strongly consistent with prompt-engineering and conversational-UX practice. The stronger philosophical claim ('all complex AI workflows originate from human conversational decision trees') is a useful lens but not universal — many production workflows are better modeled as business processes, state machines, or event-driven dataflows. Dialogue is one structural perspective, not the only one.


## Related across days
- [concept-context-scoping](#concept-context-scoping)
- [framework-skill-creation](#framework-skill-creation)
- [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering)
- [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue)
- [entity-k-kumar](#entity-k-kumar)


#### concept-edit-source-principle

*type: `concept` · sources: paper*

## The principle

> *"Editing the output fixes this run. Editing the source fixes every future run."* — see [quote-edit-source](#quote-edit-source).

ICM's review gates ([concept-stage-contracts](#concept-stage-contracts)) currently invite editing **stage output**, which works but treats symptoms. The Edit-Source Principle argues for fixing *causes*.

## Worked example

If a script sounds wrong at stage 2:

- **Patch the symptom**: edit the script directly (fixes this run only).
- **Fix the cause**: ask *why* it sounds wrong and trace to:
  - an underspecified voice guide (Layer 3),
  - a stage contract emphasising the wrong quality (Layer 2),
  - a stage-1 framing that misdirected the script (upstream output).

Editing source fixes every future run.

## Why this matters

Drawn from compiler/software practice and tied to [concept-icm-as-compilation](#concept-icm-as-compilation): editing output is patching the binary; editing source is fixing the cause. The principle argues **source files should be what improves over time**, pointing ICM toward source-integrity discipline rather than output-patching.

## Open future direction

Fully realizing this principle requires tooling that doesn't yet exist: when a phrase sounds wrong at stage 3, what source file actually caused it? Today the practitioner does the tracing manually. The corresponding research direction is [question-semantic-debugging](#question-semantic-debugging) — building source-map equivalents for content pipelines.


## Related across days
- [framework-skill-creation](#framework-skill-creation)
- [quote-edit-source](#quote-edit-source)
- [synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution)
- [question-semantic-debugging](#question-semantic-debugging)


#### concept-five-layer-hierarchy

*type: `concept` · sources: paper*

## The five layers

Context is delivered in five tiers, each with a token budget and a diagnostic question.

| Layer | File | Question | Budget |
|---|---|---|---|
| **L0** | `CLAUDE.md` | *Where am I?* | ~800 tokens |
| **L1** | `CONTEXT.md` (root) | *Where do I go?* | ~300 tokens |
| **L2** | `CONTEXT.md` (stage) | *What do I do?* | 200–500 tokens |
| **L3** | `references/` | *What rules apply?* | 500–2k tokens |
| **L4** | `output/` | *What am I working with?* | per-run |

- **Layers 0–2 are structural/routing** (~1.5k tokens total).
- **Layers 3–4 are content**: L3 is the *factory* (stable recipe), L4 is the *product* (ingredients of this run).

## Why the split matters

The distinction (Table 2 in [entity-icm-paper](#entity-icm-paper)) tells the agent how to treat each file:
- **Layer 3** as constraints to internalize.
- **Layer 4** as input to process.

Most of a well-scoped stage's context is task-relevant, which is why a stage lands at 2–8k tokens rather than 40k — see [claim-token-efficiency](#claim-token-efficiency). Separating these is the implementation discipline captured in [action-separate-l3-l4](#action-separate-l3-l4).

## Validation status

The partitioning into *global info + task contract + references + current artifacts* aligns with current prompt-architecture best practices and with Liu et al.'s "Lost in the Middle" — see [prereq-llm-context-windows](#prereq-llm-context-windows). However, the specific token counts are *representative*, not from a controlled A/B test, as the authors acknowledge. Whether the hierarchy holds across non-Claude models is the subject of [question-cross-model](#question-cross-model).

## On-disk realization

The layers are mapped onto folders by [framework-icm-architecture](#framework-icm-architecture); the resulting routing-vs-content split is what enables [concept-context-scoping](#concept-context-scoping).


## Related across days
- [concept-icm-d1](#concept-icm-d1)
- [framework-icm-architecture](#framework-icm-architecture)
- [synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap)
- [action-separate-l3-l4](#action-separate-l3-l4)


#### concept-icm-as-compilation

*type: `concept` · sources: paper*

## The analogy

Beyond Unix pipelines and Make (see [prereq-unix-pipelines](#prereq-unix-pipelines)), the closest theoretical analogy is **multi-pass compilation** (Aho et al.).

A multi-pass compiler transforms source through discrete passes:

1. lexer → tokens,
2. parser → syntax tree,
3. semantic analysis,
4. optimization,
5. code generation,

each reading the previous pass's output and writing a **well-defined, inspectable intermediate representation**.

## ICM as content compilation

ICM does the same with *content*:

- **research** transforms a brief into structured output,
- **script** transforms research into a script,
- **production** transforms the script into animation specs and code.

The intermediate artifacts are plain files that can be opened, read, and edited — the same property compilers exploit for debugging builds.

## Incremental compilation

The analogy extends to **incremental compilation**: re-running only the stages downstream of a change rather than the whole pipeline. This is a natural fit for ICM's filesystem layout — a downstream stage simply re-reads the (now-updated) upstream output.

## Connection to source integrity

The compiler analogy is also the foundation of the [concept-edit-source-principle](#concept-edit-source-principle): editing *output* is patching the binary; editing *source* fixes every future build. Quoted directly in [quote-edit-source](#quote-edit-source).


## Related across days
- [prereq-unix-pipelines](#prereq-unix-pipelines)
- [concept-icm-d2](#concept-icm-d2)
- [concept-stage-contracts](#concept-stage-contracts)
- [arc-evidence-base-evolution](#arc-evidence-base-evolution)


#### concept-icm-d1

*type: `concept` · sources: video*

## Definition

The **Interpretible Context Methodology (ICM)** is a contrarian approach to building AI agent architectures. Instead of relying on complex multi-agent orchestration frameworks such as [entity-langchain](#entity-langchain) or [entity-semantic-kernel](#entity-semantic-kernel), ICM advocates for using plain text, markdown files, and standard folder hierarchies to manage context and workflows.

## Core Philosophy

An AI agent (typically [entity-claude](#entity-claude)) can navigate a well-structured file system to:

- Gather necessary context on demand
- Understand constraints encoded as markdown
- Execute tasks deterministically without orchestration glue code

By breaking down workflows, prompt libraries, and specific 'skills' into discrete markdown files, users create a **highly transparent, easily modifiable, and human-readable architecture**.

## Claimed Benefits

- **Token reduction of 20–40%** versus framework-driven approaches (see [claim-icm-superiority](#claim-icm-superiority) — note that this figure is anecdotal/case-study based, not a peer-reviewed benchmark).
- Faster execution and lower latency.
- Significantly lower barrier to entry for non-technical teams who only need to manage folders and text files instead of Python code or API integrations.
- Greater determinism and inspectability of agent behaviour.

## Formal Grounding (Companion Paper)

The talk's practitioner framing is formalized in the peer-companion paper [entity-icm-paper-arxiv](#entity-icm-paper-arxiv) (*"Interpretable Context Methodology: Folder Structure as Agent Architecture,"* Van Clief & McDermott, arXiv:2603.16021, Edinburgh). The paper makes explicit a structure the video only implies — the **Five-Layer Context Hierarchy**:

- **Layer 0** — `CLAUDE.md` (global identity)
- **Layer 1** — `CONTEXT.md` (workspace routing)
- **Layer 2** — Stage `CONTEXT.md` (per-stage contracts)
- **Layer 3** — Reference material (stable across runs)
- **Layer 4** — Working artifacts (per-run content)

Numbered stage folders (`01_research`, `02_script`, `03_production`) carry explicit Inputs / Process / Outputs contracts, with **human review gates** between stages. The paper grounds the efficiency claim quantitatively: **2,000–8,000 focused tokens per stage** vs. a monolithic prompt **exceeding 40,000 tokens, most of it irrelevant** — invoking Liu et al.'s *"lost in the middle"* degradation as the mechanism. It also reframes ICM as **"interpretable" in Rudin's sense** (inherently inspectable, not post-hoc explained) and as Karpathy-style *"context engineering."* The full set of paper diagrams (five-layer hierarchy with token budgets, the folder tree, the token-composition chart, the review-gate pipeline) is captured and synthesized in [exhibit-icm-paper-figures](#exhibit-icm-paper-figures).

## Cultural Validation

[entity-anthropic](#entity-anthropic) and researchers such as [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) independently arrived at similar ideas — Karpathy's 'LLM Wiki' approach mirrors ICM's emphasis on structured markdown as the substrate of agent context.

## Related Building Blocks

- The structural origin of ICM skills is explored in [concept-dialogue-structure](#concept-dialogue-structure).
- ICM is operationalized in the [framework-skill-creation](#framework-skill-creation) process.
- The maturity ladder for adopting ICM is described in [concept-three-levels-ai](#concept-three-levels-ai).
- Its ultimate expression is [concept-voice-collaboration](#concept-voice-collaboration).

## Prerequisites for Understanding

- [prereq-llm-context](#prereq-llm-context)
- [prereq-markdown](#prereq-markdown)

## Open Questions

- [question-icm-scaling](#question-icm-scaling) — how does this scale to massive enterprise codebases?

## Counter-Perspective

See [contrarian-frameworks](#contrarian-frameworks). Note that Microsoft's Cloud Adoption Framework and other enterprise sources agree with the *starting* posture (single agent first) but argue that multi-agent frameworks remain valuable across security boundaries, multi-team environments, and at large scale.


## Related across days
- [concept-icm-d2](#concept-icm-d2)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [framework-icm-architecture](#framework-icm-architecture)
- [concept-stage-contracts](#concept-stage-contracts)
- [synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap)
- [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude)


#### concept-icm-d2

*type: `concept` · sources: paper*

## Definition

ICM is the paper's central contribution: for sequential, human-reviewed, repeatable workflows, the **folder structure replaces the orchestration framework**. A single AI agent reads different context at each stage rather than multiple agents coordinating through code.

## How it works

- **Numbered folders** (e.g. `01_research`, `02_script`, `03_production`) represent pipeline stages.
- **Plain markdown files** carry the prompts and context that tell the agent what role to play at each step.
- **Local scripts** handle mechanical work that needs no AI.
- **Human review gates** sit at each output boundary — see [concept-stage-contracts](#concept-stage-contracts).

The result is full pipeline capability with no framework code, no server, and no developer needed for day-to-day operation. The on-disk shape is documented in [framework-icm-architecture](#framework-icm-architecture) and the context model in [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).

## Scope of the claim

ICM is positioned **not against frameworks in general** but against their use for a specific class of problem where they are overhead — see [contrarian-frameworks-overkill](#contrarian-frameworks-overkill). It explicitly does not address real-time multi-agent collaboration or high-concurrency systems; for those, tools like [entity-autogen](#entity-autogen) remain appropriate.

## Performance argument

The efficiency story (see [claim-token-efficiency](#claim-token-efficiency)) is that scoped stages land at ~2–8k focused tokens versus a ~42k monolithic prompt that is mostly irrelevant. The mechanism is explained in [concept-context-scoping](#concept-context-scoping).

## Reference implementation

All workspaces were built and run with [entity-claude-code](#entity-claude-code). The methodology is formally articulated in [entity-icm-paper](#entity-icm-paper) by [entity-jake-van-clief](#entity-jake-van-clief) and [entity-david-mcdermott](#entity-david-mcdermott).


## Related across days
- [concept-icm-d1](#concept-icm-d1)
- [concept-dialogue-structure](#concept-dialogue-structure)
- [concept-three-levels-ai](#concept-three-levels-ai)
- [framework-skill-creation](#framework-skill-creation)
- [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude)


#### concept-observability-side-effect

*type: `concept` · sources: paper*

## The accidental feature

The most useful property of ICM may be one not designed as a feature. Every intermediate output is a plain-text file, so the system is **observable without any added tooling**.

Rudin argued inherently interpretable systems should be preferred over post-hoc explanations of opaque ones. ICM is a glass-box workflow that was **never opaque in the first place**, because every artifact is human-readable — see [quote-glass-box](#quote-glass-box).

## Mapping to human-AI guidelines

It maps onto Amershi et al.'s human-AI interaction guidelines:

- **Make clear what the system can do** → stage contracts ([concept-stage-contracts](#concept-stage-contracts)) make capabilities explicit.
- **Support efficient correction** → markdown files: open, edit, save.
- **Support efficient dismissal** → review gates support deciding not to proceed.

## Caveats on "glass-box"

The enrichment overlay notes that "glass-box AI" has stricter definitions in regulated settings (Nokia's "Glass Box Imperative"; Ncontracts) that include:

- **Provenance** — which model version, which configuration.
- **Fine-grained traceability** — output spans → source snippets.
- **Formal safety controls and audit trails.**

ICM provides strong observability and partial auditability (via Git + human review), but **not yet automated traceability** — that gap is [question-semantic-debugging](#question-semantic-debugging). By colloquial standards ICM is a glass box; by strict governance standards it is a partial glass box.

## Contrarian framing

The broader contrarian move — that observability shouldn't require a logging or dashboard layer — is captured in [contrarian-observability-free](#contrarian-observability-free).


## Related across days
- [quote-glass-box](#quote-glass-box)
- [contrarian-observability-free](#contrarian-observability-free)
- [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue)
- [question-semantic-debugging](#question-semantic-debugging)


#### concept-portability

*type: `concept` · sources: paper*

## The proposition

A workspace is a folder. It can be:

- copied to another machine,
- committed to Git,
- emailed as a zip,
- synced through cloud storage.

It carries its own prompts, context structure, and stage definitions. **There is no server to configure or deployment artifact to build** — the workspace definition *is* the system. Infrastructure-as-code applied to AI workflows.

## Versioning benefits

- Every prompt change and stage edit is **diffable and reversible** via Git.
- Stage outputs can be committed after each run, creating a **version history of the pipeline's behaviour** over time.

## Handoff economics

Handing a workflow to a client means **copying a folder**. They can run and edit it without a developer — versus a framework solution that requires:

- documentation,
- environment setup,
- dependency management,
- ongoing support.

This is the mechanism behind the external adoption pattern in [claim-external-adoption](#claim-external-adoption) — and the structural reason a non-coder workspace builder ([framework-workspace-builder](#framework-workspace-builder)) is even feasible.

## Counter-perspective

The enrichment overlay flags a real limitation: in enterprise contexts, portability requires more than file copying. Environment capture (library/model versions, API keys), access control and data governance, and reproducible runtimes (containers, IaC) sit on top of "workspace as a folder." Git-versioning files is necessary but not sufficient for reproducibility and compliance — ICM's portability story is strong for small teams and local workflows; larger orgs need to integrate it with environment management and governance tooling.


## Related across days
- [framework-icm-architecture](#framework-icm-architecture)
- [concept-icm-d2](#concept-icm-d2)
- [action-implement-folders](#action-implement-folders)


#### concept-stage-contracts

*type: `concept` · sources: paper*

## The contract

Each stage folder carries a Layer 2 `CONTEXT.md` that declares:

- **Inputs** — what files this stage reads.
- **Process** — the role the agent plays here.
- **Outputs** — what files it writes.

It reads from the previous stage's `output/` folder, processes per its own contract, and writes to its own `output/` folder. The full on-disk shape is in [framework-icm-architecture](#framework-icm-architecture).

## Review gates

At each boundary, a human can **inspect and edit** the output before the next stage runs:

- A research document that misses an angle gets edited before the script stage.
- A script that runs too long gets trimmed before production.

This editing rhythm follows a U-shape — heavy at stage 1, light in the middle, heavy at the final stage — see [claim-ushaped-intervention](#claim-ushaped-intervention). The implementation discipline is [action-review-gates](#action-review-gates).

## Why this is not "multi-agent"

The same model executes every stage; only the folder structure differs. The "multi-agent" behaviour is an **illusion produced entirely by folder scoping plus human gates** — there is no router model and no orchestration code, per the thesis captured in [quote-folder-controls-context](#quote-folder-controls-context). Why the same model behaves differently per stage is explained by [concept-context-scoping](#concept-context-scoping).

## Future direction

Review gates today invite editing *output*, which works but treats symptoms. A more principled approach — editing the *source* that produced the output — is articulated in [concept-edit-source-principle](#concept-edit-source-principle) and pointed at by [question-semantic-debugging](#question-semantic-debugging).


## Related across days
- [framework-skill-creation](#framework-skill-creation)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract)
- [action-numbered-stage-folders](#action-numbered-stage-folders)
- [action-review-gates](#action-review-gates)
- [claim-ushaped-intervention](#claim-ushaped-intervention)


#### concept-three-levels-ai

*type: `concept` · sources: video*

## Definition

A maturity model for AI adoption inside organizations, proposed by [entity-jake-van-clief](#entity-jake-van-clief) from his enterprise consulting work.

## Level 1 — Copy & Paste

Baseline ad-hoc use: users paste prompts into chatbots (ChatGPT, [entity-claude](#entity-claude)), iterate manually, copy outputs back into their work.

- Low effort, low and inconsistent impact
- No reuse, no shared assets
- Every interaction is ephemeral

## Level 2 — Structured Use

Users operate from refined, saved prompts, brand-tone guides, and verified outputs. Teams employ 'skills' and prompt libraries to standardize interactions.

- Jump from L1→L2 is the **highest-ROI move** (see [claim-l2-roi](#claim-l2-roi) and [quote-l2-roi](#quote-l2-roi))
- Flattens the effort curve while raising output quality and consistency
- Operationalized via [action-move-to-l2](#action-move-to-l2) and [action-codify-voice](#action-codify-voice)

## Level 3 — Integrated Workflow

Automated pipelines where multiple skills, prompts, and deterministic scripts (e.g., Python) are chained. The AI navigates the folder structure established under [concept-icm-d1](#concept-icm-d1) to execute multi-step processes with minimal human intervention.

- Highest absolute impact
- High engineering cost: requires distributed-systems thinking (orchestration, observability, error handling)
- Sees ad-hoc assistance replaced with systemic automation

## Enrichment Notes

Industry guidance broadly agrees that codifying prompts before building automation is the right *order* of operations. The specific framing 'highest ROI step' is grounded in practitioner experience rather than rigorous controlled studies — read it as a strong consultant heuristic, not a formal law.

For some organizations with large repetitive workloads and strong engineering teams, jumping directly into a narrow L3 deployment can also deliver outsized ROI.


## Related across days
- [concept-icm-d2](#concept-icm-d2)
- [framework-icm-architecture](#framework-icm-architecture)
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)
- [claim-l2-roi](#claim-l2-roi)
- [action-move-to-l2](#action-move-to-l2)


#### concept-voice-collaboration

*type: `concept` · sources: video*

## The Vision

The video culminates in a live demonstration of what [entity-jake-van-clief](#entity-jake-van-clief) considers the future of AI workflows: **an AI agent actively participating in a live group call**, not as a passive transcriber but as a real-time collaborator.

See [quote-voice-control](#quote-voice-control) for the speaker's framing.

## The Demo Stack

- **Voice cloning**: a custom voice model of the speaker, trained via [entity-11labs](#entity-11labs)
- **LLM runtime**: a local instance of [entity-claude](#entity-claude)
- **Codebase under control**: the speaker's 'Ethics Engine' project
- **Substrate**: a folder structure following [concept-icm-d1](#concept-icm-d1), where psychometric scales and other context live as markdown
- **Interaction loop**: voice → STT → Claude → file system read/write → response back via TTS during the live call

## What It Replaces

The traditional paradigm of:

1. Record a meeting
2. Transcribe afterwards
3. Feed the transcript to an LLM
4. Manually pick up generated tasks

…is collapsed into a single real-time loop where the AI executes during the conversation. There is no post-meeting processing and no separate orchestration layer.

## Why ICM Matters Here

The folder structure of [concept-icm-d1](#concept-icm-d1) gives the voice-driven agent its situational awareness: when asked to *'open the openness scale'*, the agent navigates a predictable directory and finds a markdown file describing it. Without ICM the same demo would require bespoke tool-calling glue.

## Open Issues

See [question-voice-security](#question-voice-security). Voice control of local code raises authentication, permission-scoping, and voice-spoofing concerns that current consumer voice stacks do not adequately address. Likely production future: multimodal (text + GUI + voice) rather than voice-only control.

See also the prediction in [claim-voice-future](#claim-voice-future).


## Related across days
- [claim-voice-future](#claim-voice-future)
- [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)
- [question-voice-security](#question-voice-security)
- [entity-11labs](#entity-11labs)


---

### Folder: frameworks

#### framework-icm-architecture

*type: `framework` · sources: paper*

## Canonical on-disk layout

The canonical structure, every node annotated by layer from [concept-five-layer-hierarchy](#concept-five-layer-hierarchy):

```
workspace/
├── CLAUDE.md                    # Layer 0: global identity
├── CONTEXT.md                   # Layer 1: workspace routing
├── _config/                     # Layer 3: cross-stage reference
├── shared/                      # Layer 3: voice, design system, conventions
├── setup/
│   └── questionnaire.md         # un-layered: runs once at creation
└── stages/
    ├── 01_research/
    │   ├── CONTEXT.md           # Layer 2: stage contract
    │   ├── references/          # Layer 3: stage-local references
    │   └── output/              # Layer 4: working artifacts
    ├── 02_script/
    │   ├── CONTEXT.md
    │   ├── references/
    │   └── output/
    └── 03_production/
        ├── CONTEXT.md
        ├── references/
        └── output/
```

## Construction steps

1. **workspace/** root holds `CLAUDE.md` (L0 identity) and `CONTEXT.md` (L1 routing).
2. **stages/** contains numbered stage folders (`01_research`, `02_script`, `03_production`).
3. **Each stage folder** holds `CONTEXT.md` (L2 contract), `references/` (L3), `output/` (L4).
4. **_config/** and **shared/** hold cross-stage L3 reference (voice, design system, conventions).
5. **setup/** holds a questionnaire run once at workspace creation.
6. **The same agent** runs each stage; folder scoping controls its context; humans review at each `output/` boundary — see [concept-stage-contracts](#concept-stage-contracts).

## Why the repeating triad matters

Every stage folder is the same triad — `CONTEXT.md` (L2) + `references/` (L3) + `output/` (L4) — which is what makes **"add or remove a stage" a filesystem operation**, not a code change. Implemented as [action-numbered-stage-folders](#action-numbered-stage-folders) and [action-separate-l3-l4](#action-separate-l3-l4).

Top-level `_config/` and `shared/` hold L3 material shared across stages without duplication. `setup/questionnaire.md` is un-layered: it runs once at creation and is the seam where non-coder onboarding happens — exploited by [framework-workspace-builder](#framework-workspace-builder).

## Dual role

The repeating, inspectable structure is **both** the human control surface and the agent's orchestration specification. This is the thesis captured in [quote-folder-controls-context](#quote-folder-controls-context).


## Related across days
- [action-implement-folders](#action-implement-folders)
- [concept-icm-d1](#concept-icm-d1)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap)
- [concept-portability](#concept-portability)


#### framework-skill-creation

*type: `framework` · sources: video*

## Purpose

A repeatable framework for converting ephemeral chatbot conversations into structured, reusable AI **skills** (markdown files) suitable for use under [concept-icm-d1](#concept-icm-d1). It operationalizes the thesis of [concept-dialogue-structure](#concept-dialogue-structure).

The framework grew out of the visual decision-tree mapping tool built by [entity-k-kumar](#entity-k-kumar).

## The Five Steps

### 1. Identify the Goal / Intent

What is the user actually trying to achieve? Example: *'Tighten this paragraph.'* This becomes the top of the decision tree.

### 2. Extract Constraints

What boundaries shaped the response? Examples:

- Reduce wordiness
- Maintain the original rhythm and voice
- Preserve all factual content
- Honour a length budget

### 3. Identify Assumptions

What did the user and the model implicitly assume? Examples:

- Target audience is a general blog readership
- The paragraph is final-draft quality
- No new sources or facts may be introduced

### 4. Map Sub-Goals

What intermediate steps were required? Examples:

- Identify filler phrases
- Collapse redundant clauses
- Re-balance sentence lengths
- Verify meaning preservation

### 5. Encode into a Markdown Skill

Write Goal / Constraints / Assumptions / Sub-goals into a structured markdown file in the skills folder. This artifact becomes a permanent, version-controlled skill consumable by any agent operating inside the [concept-icm-d1](#concept-icm-d1) vault.

## Why This Works

The framework is consistent with mainstream prompt-engineering practice (deriving system prompts and tools by abstracting successful interactions) and conversational-UX practice (modelling chatbot flows as decision trees). Multi-agent research likewise decomposes tasks into conversational roles with explicit protocols.

## Related Action

[action-codify-voice](#action-codify-voice) is a concrete instance of this framework applied to writing voice/tone.


## Related across days
- [concept-stage-contracts](#concept-stage-contracts)
- [framework-workspace-builder](#framework-workspace-builder)
- [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract)
- [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)
- [concept-edit-source-principle](#concept-edit-source-principle)


#### framework-workspace-builder

*type: `framework` · sources: paper*

## What it is

ICM includes a **workspace-builder**: a five-stage workspace whose **output is a new workspace**. It is itself built with ICM conventions ([framework-icm-architecture](#framework-icm-architecture)), so the workspaces it produces are consistent because the builder enforces the same structural rules it was built with.

## The five stages

1. **Discovery** — identify the domain and the workflow.
2. **Stage mapping** — find the natural breakpoints between stages.
3. **Scaffolding** — create the folder structure.
4. **Questionnaire design** — decide what setup questions the workspace should ask.
5. **Validation** — confirm the pipeline runs end to end.

## Why self-hosting matters

Practitioners can create workspaces for their own domains **without understanding the conventions in detail** — the builder encodes them in its process. Examples:

- A marketing team builds a campaign-production workspace.
- A research group builds a literature-review workspace.
- A consultancy builds a client-deliverable pipeline.

## Adoption mechanism

The self-hosting property (a workspace that emits workspaces) is the **mechanism behind ICM's adoption beyond its author** — see [claim-external-adoption](#claim-external-adoption) and [entity-external-adopters](#entity-external-adopters). Combined with [concept-portability](#concept-portability) (a workspace is a folder you can hand off), it enables non-developer onboarding.


## Related across days
- [framework-skill-creation](#framework-skill-creation)
- [entity-external-adopters](#entity-external-adopters)
- [claim-external-adoption](#claim-external-adoption)
- [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)


---

### Folder: claims

#### claim-external-adoption

*type: `claim` · sources: paper*

## The claim

ICM workspaces have been adopted by groups outside the author's organization — listed in [entity-external-adopters](#entity-external-adopters):

- **University of Edinburgh's Neuropolitics Lab** — academic research workspaces.
- **ICR Research** — research/analytics workflows.
- **Academy of International Affairs (Bonn)** — policy analysis.

Details are limited by NDAs.

## Why this matters

The existence of these adoptions answers the natural reviewer question: **does ICM work when someone other than its designer builds and operates the workspace?** Preliminary answer: yes, across academic research, policy analysis, and content production.

The mechanism enabling this is the [framework-workspace-builder](#framework-workspace-builder) (a workspace that emits workspaces) plus [concept-portability](#concept-portability) (a workspace is a folder).

## Confidence: MEDIUM

The enrichment overlay sharpens the read:

- The organizations named exist and have plausible needs for document-heavy, human-reviewed workflows.
- There are **no public case studies, independent evaluations, or codebases** from these adopters.
- The evidence is essentially "reported by the authors, constrained by NDAs."

This is weak-but-non-zero evidence: it demonstrates *interest and some practical use*, but **not robustness, scalability, or comparative performance**. A structured study of these external deployments is named as future work.

Related future-validation needs are [question-controlled-comparison](#question-controlled-comparison) and [question-cross-model](#question-cross-model).


## Related across days
- [entity-external-adopters](#entity-external-adopters)
- [framework-workspace-builder](#framework-workspace-builder)
- [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)
- [arc-evidence-base-evolution](#arc-evidence-base-evolution)


#### claim-icm-superiority

*type: `claim` · sources: video*

## Claim

[entity-jake-van-clief](#entity-jake-van-clief) asserts that the [concept-icm-d1](#concept-icm-d1) — simple folder structures plus markdown files navigated by a single agent — is superior to complex multi-agent frameworks like [entity-langchain](#entity-langchain) or [entity-semantic-kernel](#entity-semantic-kernel).

## Specific Sub-Claims

1. **Token usage reduction of 20–40%** versus framework-driven approaches.
2. **Faster outcomes** (less orchestration overhead, no agent-to-agent message loops).
3. **Easier adoption and maintenance** by non-technical teams.
4. **Greater determinism** in execution.
5. **'Multi-agentic harnesses' are absurdities** — a single well-contextualized agent is sufficient.

## Evidence in the Source

- Live demos of ICM-based skills outperforming framework approaches on the same tasks.
- Reports from enterprise clients adopting the methodology.
- The headline [quote-absurdities](#quote-absurdities) crystallizes the rhetorical position.
- Reinforced by [contrarian-frameworks](#contrarian-frameworks).

## Companion-Paper Grounding (sharpens, but does not benchmark)

The formal paper [entity-icm-paper-arxiv](#entity-icm-paper-arxiv) supplies the figures behind the "20–40%" headline:

- **Per-stage context budget:** 2,000–8,000 *focused* tokens per stage vs. monolithic prompts **exceeding 40,000 tokens, most of it irrelevant**. The win is framed as *relevance density*, not raw compression.
- **Mechanism, not just outcome:** the reduction is justified theoretically via Liu et al.'s *"lost in the middle"* — staged loading keeps load-bearing content out of the degraded mid-context band.
- **Adoption evidence (N=33, informal self-report):** **30 of 33** practitioners report a **U-shaped human-intervention pattern** — heavy edits at stage 1 (**92%**), light at stage 2 (**30%**), heavy at stage 3 (**78%**); three non-coders shipped working workspaces.
- ⚠️ **Still no controlled head-to-head.** The paper *explicitly states* there is "no controlled comparison between ICM's staged loading and monolithic prompting," and all testing used a single model family (Claude Opus/Sonnet 4.6). So the figures corroborate the efficiency story but **do not** establish superiority over LangChain/Semantic Kernel on a benchmark.

## Confidence: **high** (per source) — but validation says:

- **Single-agent-first guidance is mainstream**, not fringe. Microsoft's Cloud Adoption Framework explicitly recommends starting with a single-agent system and only escalating to multi-agent when crossing security boundaries, team boundaries, or scaling needs.
- **The 20–40% token-reduction figure is anecdotal** — no peer-reviewed benchmark exists comparing ICM-style navigation vs LangChain/Semantic Kernel.
- **The 'absurdities' framing overshoots** — multi-agent research shows role decomposition (retrieval, reasoning, validation, monitoring) improves modularity and robustness in genuinely complex environments. Enterprise multi-agent literature also documents necessary patterns (sagas, circuit breakers, governance) that are not 'absurd' but earned.

## Testability

**Yes** — benchmark a representative workflow implemented via ICM vs LangChain/Semantic Kernel on (a) tokens consumed, (b) wall-clock latency, (c) maintenance effort, and (d) determinism of outputs across runs.

## Related Action

[action-implement-folders](#action-implement-folders)


## Related across days
- [claim-token-efficiency](#claim-token-efficiency)
- [claim-external-adoption](#claim-external-adoption)
- [synthesis-single-agent-clarified](#synthesis-single-agent-clarified)
- [arc-evidence-base-evolution](#arc-evidence-base-evolution)


#### claim-l2-roi

*type: `claim` · sources: video*

## Claim

Based on consulting work with enterprise companies, [entity-jake-van-clief](#entity-jake-van-clief) claims that moving an organization from Level 1 (ad-hoc copy/paste into chatbots) to Level 2 (structured prompts and verified outputs) of [concept-three-levels-ai](#concept-three-levels-ai) delivers the **highest ROI** of any AI adoption step.

See [quote-l2-roi](#quote-l2-roi) for the punchy framing.

## Reasoning

- **L3 has higher absolute impact** but requires significant engineering investment (distributed systems, observability, orchestration).
- **L1→L2 is comparatively cheap**: build shared prompt libraries, brand-tone guides, and basic markdown 'skills'.
- The ratio of (gain in quality + consistency) to (effort) is maximized at this transition.

## How To Act On It

See [action-move-to-l2](#action-move-to-l2) and [action-codify-voice](#action-codify-voice).

## Confidence: **high** (per source) — validation says:

- **Well-aligned with practitioner consensus**. Standardizing prompts/patterns before building automation is widely recommended (Microsoft's adoption guidance, prompt-library/playbook literature, vendor playbooks).
- **Empirical ROI quantification is scarce**. The 'highest ROI' framing is consultant insight, not a controlled finding.
- **Counter-case**: organizations with large repetitive workloads and strong engineering teams may extract outsized ROI by jumping directly into a narrow L3 deployment.

## Testability

**Yes, with caveats**. ROI must be operationalized (output quality scores, cycle time, error rate) and measured pre/post intervention across a comparable cohort. This is hard but feasible.


## Related across days
- [concept-three-levels-ai](#concept-three-levels-ai)
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)
- [quote-l2-roi](#quote-l2-roi)


#### claim-token-efficiency

*type: `claim` · sources: paper*

## The claim

Representative token counts from the script-to-animation workspace:

| Stage | Focused tokens |
|---|---|
| `01_research` | ~4.9k |
| `02_script` | ~5.5k |
| `03_production` | ~5.6k |

A **monolithic approach** loading all stages' instructions, all reference material, and all prior outputs produces a context window of **~42k tokens**, most of it irrelevant to the current task — the "unused/irrelevant" band dwarfs the useful payload.

## The mechanism

The efficiency argument is:

1. **Relevance density**: scoped stages keep almost all tokens task-relevant — see [concept-context-scoping](#concept-context-scoping).
2. **Avoiding "lost in the middle"** (Liu et al.) degradation — see [prereq-llm-context-windows](#prereq-llm-context-windows).

Not raw compression.

## Confidence and caveats

The paper is **explicit** that these are *representative counts*, not a measured benchmark, and that **no controlled comparison to monolithic prompting was run**.

The enrichment overlay confirms: the rationale is well-grounded in the literature, but the precise token counts and implied performance gains remain anecdotal and unvalidated. The corresponding open question is [question-controlled-comparison](#question-controlled-comparison).

## Mechanism source

The ability to scope context this tightly comes from the [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) partitioning structure into routing (~1.5k) vs. content layers.


## Related across days
- [claim-icm-superiority](#claim-icm-superiority)
- [prereq-llm-context-windows](#prereq-llm-context-windows)
- [prereq-llm-context](#prereq-llm-context)
- [arc-evidence-base-evolution](#arc-evidence-base-evolution)


#### claim-ushaped-intervention

*type: `claim` · sources: paper*

## The claim

Across **33 community members** using the script-to-animation or structurally similar multi-stage workspaces, **30 report a U-shaped intervention pattern**:

| Position | Editing intensity | Reported share |
|---|---|---|
| Stage 1 (direction-setting) | Heavy | ~92% |
| Middle stages (constrained execution) | Light | ~30% |
| Final stage (alignment) | Heavy | ~78% |

The remaining three report roughly equal editing across stages.

## What the two peaks mean

The two peaks reflect **different work**:

- **Stage-1 editing** is *creative judgment* — narrowing from broad possibilities to a specific angle.
- **Final-stage editing** is *alignment work*, closer to debugging.

This aligns with the implementation in [action-review-gates](#action-review-gates) and the stage-contract logic in [concept-stage-contracts](#concept-stage-contracts).

## Confidence: LOW

The numbers come from **practitioner conversations, not instrumented measurement**, and should be read as directional. The community is small and self-selected.

The enrichment overlay agrees: the *shape* is plausible and compatible with human-in-the-loop literature (e.g., Amershi et al., "Guidelines for Human–AI Interaction," which encourages early and late control points but does not report a U-curve). The *numbers* are weak evidence; replication via instrumented studies is the corresponding open question — [question-controlled-comparison](#question-controlled-comparison). External evidence on whether ICM works beyond its author is in [claim-external-adoption](#claim-external-adoption).


## Related across days
- [action-review-gates](#action-review-gates)
- [concept-stage-contracts](#concept-stage-contracts)
- [synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution)
- [arc-evidence-base-evolution](#arc-evidence-base-evolution)


#### claim-voice-future

*type: `claim` · sources: video*

## Claim

[entity-jake-van-clief](#entity-jake-van-clief) claims that the future of software engineering and workflow automation lies in **real-time, voice-driven AI collaboration** — see [concept-voice-collaboration](#concept-voice-collaboration). He predicts that the ability to verbally command an agent to read, analyze, and write to a local file system during a live meeting will replace the current paradigm of post-meeting transcript analysis and manual task execution.

See [quote-voice-control](#quote-voice-control) for the framing.

## What's Already Possible

The demoed stack (voice cloning via [entity-11labs](#entity-11labs) + local [entity-claude](#entity-claude) + [concept-icm-d1](#concept-icm-d1) folders) is technically feasible today. Real-time transcription, live IDE editing by voice, and 'AI teammate' patterns already exist in commercial tools.

## Confidence: **medium**, **not testable** (it's a prediction).

Validation perspective:

- **Technically plausible and partially realized.** This is a forward-looking but reasonable prediction.
- **Broad consensus on voice as *the* dominant modality does not exist.** Many engineers prefer keyboard/editor workflows for precision, speed, and privacy.
- **Substantial barriers**: see [question-voice-security](#question-voice-security). Voice spoofing, replay attacks, bystander exposure, open-office acoustics, and corporate IT policy all impede mainstream adoption.
- **Likely future**: multimodal control (text + GUI + voice) where voice is one option, not a universal replacement.


## Related across days
- [concept-voice-collaboration](#concept-voice-collaboration)
- [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)
- [question-voice-security](#question-voice-security)


---

### Folder: entities

#### entity-11labs

*type: `entity` · sources: video · entity: tool*

## Profile

ElevenLabs (referred to in the video as '11Labs') is a commercial provider of AI speech synthesis and voice cloning, widely used for custom voices, dubbing, and interactive applications.

## Role in the Source

- Used to create a **custom voice model of [entity-jake-van-clief](#entity-jake-van-clief) himself** for the real-time voice-driven AI collaboration demo
- Forms the TTS/voice-cloning side of the loop demonstrated in [concept-voice-collaboration](#concept-voice-collaboration)

## Security Footnote

The ease of voice cloning here is exactly the substrate of the concerns raised in [question-voice-security](#question-voice-security): if voices are cheap to clone, voice-as-authentication becomes risky for high-trust filesystem control.


## Related across days
- [concept-voice-collaboration](#concept-voice-collaboration)
- [claim-voice-future](#claim-voice-future)
- [question-voice-security](#question-voice-security)


#### entity-andrej-karpathy-d1

*type: `entity` · sources: video · entity: person*

## Profile

Andrej Karpathy is an influential AI researcher: former Director of AI at Tesla, founding member of OpenAI, and an educator widely followed for material on deep learning and LLM usage. In 2024 he publicly announced collaboration with [entity-anthropic](#entity-anthropic).

## Relevance to This Vault

- He popularized an 'LLM Wiki' / markdown-based personal knowledge workflow that closely mirrors the philosophy of [concept-icm-d1](#concept-icm-d1).
- Cited by [entity-jake-van-clief](#entity-jake-van-clief) as independent validation that prominent AI practitioners are converging on folder-based, markdown-first context management.
- Symbolic of the broader cultural shift away from heavy orchestration frameworks toward simple, inspectable substrates.


## Related across days
- [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2)
- [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering)
- [concept-context-scoping](#concept-context-scoping)


#### entity-andrej-karpathy-d2

*type: `entity` · sources: paper · entity: person*

## Role in this source

Cited as the source of **"context engineering"** (2025) — the framing that:

> *system performance depends on what context is delivered, in what structure, and at what moment.*

## Why ICM cites him

ICM **operationalizes Karpathy's framing** by structuring context into separate organizational tiers ([concept-five-layer-hierarchy](#concept-five-layer-hierarchy)) rather than a monolithic prompt — see [concept-context-scoping](#concept-context-scoping).

Karpathy is not an author of ICM; he is the **lineage anchor** for the context-engineering posture the methodology adopts. Other lineage anchors are in [prereq-unix-pipelines](#prereq-unix-pipelines) (Unix philosophy), [prereq-llm-context-windows](#prereq-llm-context-windows) (Liu et al.'s "lost in the middle"), and [concept-icm-as-compilation](#concept-icm-as-compilation) (Aho et al.'s compiler architecture).


## Related across days
- [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1)
- [concept-dialogue-structure](#concept-dialogue-structure)
- [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering)


#### entity-anthropic

*type: `entity` · sources: video · entity: organization*

## Profile

Anthropic is an AI safety and research company that develops the [entity-claude](#entity-claude) family of large language models and promotes 'constitutional AI' approaches.

## Relevance to This Vault

- Cited as an organization that heavily uses the concept of 'skills' and structured context, aligning closely with [concept-icm-d1](#concept-icm-d1).
- [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) is noted to have recently joined Anthropic to teach, reinforcing the cultural overlap between Karpathy's 'LLM Wiki' approach and ICM.
- [entity-claude](#entity-claude) is the model used in the live demos including [concept-voice-collaboration](#concept-voice-collaboration).


## Related across days
- [entity-claude](#entity-claude)
- [entity-claude-code](#entity-claude-code)


#### entity-autogen

*type: `entity` · sources: paper · entity: tool*

## What it is

**Microsoft's open-source multi-agent conversation framework** for building LLM applications with multiple cooperating agents.

## Why it appears in the paper

AutoGen is cited as the **appropriate tool where ICM is not**.

- ICM's sequential, file-based handoffs are too slow for real-time multi-agent collaboration.
- AutoGen provides the message-passing infrastructure required where agents communicate dynamically and respond to each other in tight loops.

## What this signals

The contrast marks **the precise boundary of ICM's claim** ([contrarian-frameworks-overkill](#contrarian-frameworks-overkill)): ICM is for sequential, human-reviewed, repeatable workflows; AutoGen (and similar frameworks like LangChain, Semantic Kernel) is for everything else.

The bounded contrarian move — "frameworks are overhead **for this class**, not in general" — is what makes ICM's positioning defensible rather than absolutist.


## Related across days
- [entity-langchain](#entity-langchain)
- [entity-semantic-kernel](#entity-semantic-kernel)
- [contrarian-frameworks](#contrarian-frameworks)
- [contrarian-frameworks-overkill](#contrarian-frameworks-overkill)
- [recurring-foil-frameworks](#recurring-foil-frameworks)


#### entity-claude-code

*type: `entity` · sources: paper · entity: tool*

## What it is

The **sole agent runtime** used to develop and run all ICM workspaces. From Anthropic.

- Primary orchestrating agent: **Claude Opus 4.6**.
- Subagent workers: **Claude Sonnet 4.6**.

## How it interacts with ICM

Within a stage, the orchestrating Opus agent **delegates sub-tasks to faster Sonnet 4.6 subagents** — delegation itself driven by the folder structure (the agent reads the stage's `CONTEXT.md` to decide what to delegate and what context to provide). See [framework-icm-architecture](#framework-icm-architecture).

## Important nuance: single-orchestrator, not single-agent in execution

This means ICM is **single-ORCHESTRATOR with folder-driven subagent delegation**, not strictly single-agent in execution.

The paper's claim is precisely **"no orchestration framework,"** not **"no second model."** The extraction notes this explicitly, and it should not be over-simplified when summarising. The mechanism is unchanged: only files differ between stages — see [quote-folder-controls-context](#quote-folder-controls-context).

## Limitation

All testing used this single Claude model family — whether ICM generalizes to other families (GPT-4 class, Gemini, Llama) is the open question [question-cross-model](#question-cross-model).


## Related across days
- [entity-claude](#entity-claude)
- [entity-anthropic](#entity-anthropic)
- [synthesis-single-agent-clarified](#synthesis-single-agent-clarified)


#### entity-claude

*type: `entity` · sources: video · entity: product*

## Profile

Claude is [entity-anthropic](#entity-anthropic)'s family of large language models, designed for helpfulness and safety. The current generation (Claude 3 series and successors) is used extensively in filesystem-navigation and coding-assistant scenarios.

## Role in the Source

- The primary LLM used in [entity-jake-van-clief](#entity-jake-van-clief)'s demonstrations
- Drives the live demo of [concept-voice-collaboration](#concept-voice-collaboration) — a local instance executes voice-issued commands against a folder structure built per [concept-icm-d1](#concept-icm-d1)
- Used as the agent that 'navigates folders' in ICM examples

## Cultural Note

Anthropic's emphasis on **skills** as a first-class abstraction is repeatedly cited as cultural validation of ICM.


## Related across days
- [entity-claude-code](#entity-claude-code)
- [synthesis-single-agent-clarified](#synthesis-single-agent-clarified)


#### entity-david-mcdermott

*type: `entity` · sources: video, paper · entity: person*

## Day 1 — video

# David McDermott

## Profile

David McDermott appears in the source's speaker list as a participant in the conversation alongside [entity-jake-van-clief](#entity-jake-van-clief) and [entity-k-kumar](#entity-k-kumar). He is listed as present but did not have substantive content attributed to him in the *video* extraction.

**Companion-source upgrade:** the supporting paper [entity-icm-paper-arxiv](#entity-icm-paper-arxiv) identifies McDermott as the **co-author** of *"Interpretable Context Methodology: Folder Structure as Agent Architecture"* (arXiv:2603.16021) with Van Clief, affiliated with **Eduba / University of Edinburgh** — the same institution as [entity-k-kumar](#entity-k-kumar). So he is not merely a passive participant: he is the formal academic co-originator of the methodology the talk presents.

## Role in the Source

In the video: co-host / interlocutor, not individually quoted. In the broader work: **research co-author** responsible (with Van Clief) for the formal articulation of the Five-Layer Context Hierarchy, the staged-folder architecture, and the documented limitations. Where the talk supplies practitioner conviction, the paper (and thus McDermott's contribution) supplies the structure, lineage, and stated threats to validity.

## Note

This entity note is emitted per the speaker-completeness rule so that cross-vault tooling resolves every named speaker consistently. Enriched from companion source [entity-icm-paper-arxiv](#entity-icm-paper-arxiv).

## Day 2 — paper

# David McDermott

## Role

**Co-author** of [entity-icm-paper](#entity-icm-paper) ("Interpretable Context Methodology: Folder Structure as Agent Architecture"). Affiliated with **Eduba** and the **University of Edinburgh**.

## In this source

Responsible with [entity-jake-van-clief](#entity-jake-van-clief) for the **formal articulation** of:

- the [concept-five-layer-hierarchy](#concept-five-layer-hierarchy),
- the staged-folder architecture ([framework-icm-architecture](#framework-icm-architecture)),
- the stated threats to validity ([question-cross-model](#question-cross-model), [question-controlled-comparison](#question-controlled-comparison), [question-semantic-debugging](#question-semantic-debugging)).

No individual quotes are attributed to McDermott in the extraction; his contributions are co-authorial on the formal writeup.

## Related across days
- [entity-jake-van-clief](#entity-jake-van-clief)
- [entity-icm-paper](#entity-icm-paper)
- [entity-icm-paper-arxiv](#entity-icm-paper-arxiv)


#### entity-external-adopters

*type: `entity` · sources: paper · entity: organization*

## Three named organizations using ICM outside the author's group

### University of Edinburgh — Neuropolitics Lab
- **Url**: https://www.ed.ac.uk (school of Social and Political Science)
- **Use**: academic research workspaces.
- **Domain**: political cognition and related topics.

### ICR Research
- **Url**: https://www.icrresearch.com
- **Use**: research/analytics workflows.
- **Domain**: private-sector research and advisory.

### Academy of International Affairs (Bonn / NRW)
- **Url**: https://www.aia-nrw.org
- **Use**: policy analysis workflows.
- **Domain**: policy research and fellowships in international affairs.

## Why they matter

Their implementations are limited by NDAs, but their existence is the paper's **strongest evidence of generalization beyond the designer** — spanning academic research, policy analysis, and content production. This is the evidentiary backbone of [claim-external-adoption](#claim-external-adoption).

## What is *not* claimed

There are no public case studies, independent evaluations, or codebases from these adopters. The evidence is "reported by the authors, constrained by NDAs" — strong on existence, weak on robustness/scalability/comparison. A structured study is named as future work.


## Related across days
- [claim-external-adoption](#claim-external-adoption)
- [framework-workspace-builder](#framework-workspace-builder)
- [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)
- [entity-k-kumar](#entity-k-kumar)


#### entity-icm-paper-arxiv

*type: `entity` · sources: video · entity: publication*

> **Provenance note:** This note is a **supplementary companion source** added alongside the YouTube extraction. The video ([entity-jake-van-clief](#entity-jake-van-clief)'s talk) is the *primary* source of this vault; this is the formal academic paper by the **same author** that grounds the video's claims. The `yt-extract-agent` pipeline is single-source — this note was folded in manually so downstream agents inherit both the practitioner talk and its peer-companion paper.

## Bibliographic

- **Title:** Interpretable Context Methodology: Folder Structure as Agent Architecture
- **Authors:** [entity-jake-van-clief](#entity-jake-van-clief), [entity-david-mcdermott](#entity-david-mcdermott)
- **arXiv:** [2603.16021v2](https://arxiv.org/html/2603.16021v2) (18 Mar 2026)
- **Affiliation:** Eduba, University of Edinburgh

## Abstract (verbatim)

> Current approaches to AI agent orchestration typically involve building multi-agent frameworks that manage context passing, memory, error handling, and step coordination through code. These frameworks work well for complex, concurrent systems. But for sequential workflows where a human reviews output at each step, they introduce engineering overhead that the problem does not require. This paper presents Interpretable Context Methodology (ICM), a method that replaces framework-level orchestration with filesystem structure. Numbered folders represent stages. Plain markdown files carry the prompts and context that tell a single AI agent what role to play at each step. Local scripts handle the mechanical work that does not need AI at all. The result is a system where one agent, reading the right files at the right moment, does the work that would otherwise require a multi-agent framework.

## Visual Exhibits

The paper's 5 figures + 2 tables are extracted, rendered, and synthesized in **[exhibit-icm-paper-figures](#exhibit-icm-paper-figures)** — including the five-layer hierarchy with per-layer token budgets (Fig 1), the layer-annotated folder tree (Fig 2), the stacked token-composition chart showing the monolithic ~42k context as mostly irrelevant waste (Fig 3), the human-review-gate pipeline (Fig 4), the U-shaped intervention chart (Fig 5), and the framework-vs-ICM control-surface table (Table 1). These exhibits are the richest layer the companion source adds over the video.

## Formal Components (grounds the video's [concept-icm-d1](#concept-icm-d1))

**Five-Layer Context Hierarchy** — the paper's central artifact, not stated explicitly in the talk:

- **Layer 0** — `CLAUDE.md` (global identity)
- **Layer 1** — `CONTEXT.md` (workspace routing)
- **Layer 2** — Stage `CONTEXT.md` (stage contracts)
- **Layer 3** — Reference material (stable across runs)
- **Layer 4** — Working artifacts (per-run content)

**Stage structure** — numbered folders (`01_research`, `02_script`, `03_production`) with explicit Inputs / Process / Outputs contracts. **Review gates** sit between stages as human intervention points where outputs become editable. The workspace is a self-contained folder using plain markdown + JSON as the universal interface. See [concept-dialogue-structure](#concept-dialogue-structure) and [framework-skill-creation](#framework-skill-creation).

## Quantitative Grounding (sharpens [claim-icm-superiority](#claim-icm-superiority))

The video's "20–40% token reduction" is anecdotal; the paper supplies the underlying figures:

- **Per-stage context:** 2,000–8,000 *focused* tokens per stage vs. a monolithic prompt **exceeding 40,000 tokens, most of it irrelevant**.
- **Theoretical basis:** Liu et al.'s *"lost in the middle"* context-degradation effect — staged loading keeps relevant content out of the degraded middle band.
- **Practitioner observation (N=33, informal self-report):** **30 of 33** report a **U-shaped intervention pattern** — heavy editing at stage 1 (**92%**), light at stage 2 (**30%**), heavy at stage 3 (**78%**). Three non-coders successfully built video workspaces.
- ⚠️ **No controlled quantitative comparison** between ICM and monolithic prompting is reported. The numbers are efficiency/usage figures, not a benchmark win.

## Intellectual Lineage

The paper situates ICM against: McIlroy's Unix "do one thing well" + plain-text-as-interface; Shaw & Garlan's pipe-and-filter pattern; Aho et al.'s multi-pass compilation / intermediate representation; Wei et al.'s chain-of-thought decomposition; Horvitz's mixed-initiative systems; Liu et al.'s lost-in-the-middle; Fails & Olsen's interactive ML; Knuth's literate programming; Rudin's interpretability framework; and Karpathy's "context engineering" (2025) — the same lineage [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) is cited for in the talk.

## Stated Limitations (extends [question-icm-scaling](#question-icm-scaling))

- Data is **self-reported through conversation, not instrumented** measurement.
- Practitioner community is **invite-only, self-selected** (52 members); active use **concentrated in content production**.
- **All testing on a single model family** (Claude Opus/Sonnet 4.6).
- **No controlled comparison** of staged vs. monolithic loading.
- **Non-support (explicit):** cannot handle real-time multi-agent collaboration, high-concurrency systems, or complex automated branching — consistent with the talk's [contrarian-frameworks](#contrarian-frameworks) caveat that frameworks retain value across security boundaries and at scale.

## Open Questions Raised

- Does the five-layer hierarchy **generalize across model families**?
- As context windows grow, does **selective loading stay important**?
- How **sensitive** is output quality to context ordering/formatting within layers?
- Needs: formal cross-model evaluation + structured user studies with systematic data collection.


## Related across days
- [entity-icm-paper](#entity-icm-paper)
- [exhibit-icm-paper-figures](#exhibit-icm-paper-figures)
- [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude)
- [arc-evidence-base-evolution](#arc-evidence-base-evolution)


#### entity-icm-paper

*type: `entity` · sources: paper · entity: other*

## Citation

**Van Clief, J. & McDermott, D.** "Interpretable Context Methodology: Folder Structure as Agent Architecture." arXiv:2603.16021v2 (18 Mar 2026). Eduba / University of Edinburgh.

- **HTML**: https://arxiv.org/html/2603.16021v2
- **Abstract**: https://arxiv.org/abs/2603.16021
- **Reference implementation**: github.com/RinDig/Interpretable-Context-Methodology-ICM-

## Structure

Seven sections, 54 references, 5 figures, 2 tables.

## What it contains

The formal academic backing for the methodology presented in the companion video. Articulates:

- The methodology — [concept-icm-d2](#concept-icm-d2).
- The context model — [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).
- The on-disk architecture — [framework-icm-architecture](#framework-icm-architecture).
- Stage contracts and review gates — [concept-stage-contracts](#concept-stage-contracts).
- The token-efficiency argument — [claim-token-efficiency](#claim-token-efficiency).
- Practitioner data on intervention patterns — [claim-ushaped-intervention](#claim-ushaped-intervention).
- External adoption — [claim-external-adoption](#claim-external-adoption).
- Theoretical lineage — [prereq-unix-pipelines](#prereq-unix-pipelines), [prereq-llm-context-windows](#prereq-llm-context-windows).
- Limitations and threats to validity — [question-cross-model](#question-cross-model), [question-controlled-comparison](#question-controlled-comparison), [question-semantic-debugging](#question-semantic-debugging).


## Related across days
- [entity-icm-paper-arxiv](#entity-icm-paper-arxiv)
- [exhibit-icm-paper-figures](#exhibit-icm-paper-figures)
- [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude)


#### entity-jake-van-clief

*type: `entity` · sources: video, paper · entity: person*

## Day 1 — video

# Jake Van Clief

## Profile

Jake Van Clief is the **primary speaker** and originator of the [concept-icm-d1](#concept-icm-d1) thesis presented in this source. He works as an AI consultant for enterprise organizations and frames his observations *in the talk* as practitioner experience rather than research.

**That practitioner/research split is bridged by the companion source:** Van Clief is also the **lead author** of the formal paper [entity-icm-paper-arxiv](#entity-icm-paper-arxiv) (*"Interpretable Context Methodology: Folder Structure as Agent Architecture,"* arXiv:2603.16021, with co-author [entity-david-mcdermott](#entity-david-mcdermott), Eduba / University of Edinburgh). The video is his conviction-driven practitioner pitch; the paper is the same idea rendered as method, lineage, and acknowledged limitations. Read together, the talk supplies the *why-it-matters* and the paper supplies the *what-it-actually-is* and *where-it-breaks*.

## Role in the Source

- Delivers the talk on Interpretible Context Methodology and the future of AI dialogue
- Demonstrates ICM in live coding sessions
- Performs the real-time voice-driven AI collaboration demo using a custom voice model trained via [entity-11labs](#entity-11labs) and a local instance of [entity-claude](#entity-claude)

## Attributed Contributions in This Vault

Concepts proposed:

- [concept-icm-d1](#concept-icm-d1)
- [concept-three-levels-ai](#concept-three-levels-ai)
- [concept-dialogue-structure](#concept-dialogue-structure)
- [concept-voice-collaboration](#concept-voice-collaboration)

Claims advanced:

- [claim-icm-superiority](#claim-icm-superiority)
- [claim-l2-roi](#claim-l2-roi)
- [claim-voice-future](#claim-voice-future)

Quotes attributed:

- [quote-absurdities](#quote-absurdities)
- [quote-l2-roi](#quote-l2-roi)
- [quote-dialogue-theme](#quote-dialogue-theme)
- [quote-voice-control](#quote-voice-control)

Contrarian positioning:

- [contrarian-frameworks](#contrarian-frameworks)

Recommended actions he advocates:

- [action-implement-folders](#action-implement-folders)
- [action-move-to-l2](#action-move-to-l2)
- [action-codify-voice](#action-codify-voice)

## Day 2 — paper

# Jake Van Clief

## Role

**Lead author** of the ICM paper ([entity-icm-paper](#entity-icm-paper)) and **originator of the methodology**. Affiliated with **Eduba** and the **University of Edinburgh**.

## In this source

In the companion video Jake presents ICM as practitioner conviction; in the paper he renders it as method, lineage, and acknowledged limitations.

## Attributed contributions in this vault

- Core concept: [concept-icm-d2](#concept-icm-d2)
- Quote: [quote-folder-controls-context](#quote-folder-controls-context) — the thesis in one sentence.
- Quote: [quote-edit-source](#quote-edit-source) — the source-integrity principle.
- Quote: [quote-glass-box](#quote-glass-box) — ICM as inherently interpretable.

Co-author: [entity-david-mcdermott](#entity-david-mcdermott).

## Related across days
- [entity-david-mcdermott](#entity-david-mcdermott)
- [entity-icm-paper](#entity-icm-paper)
- [entity-icm-paper-arxiv](#entity-icm-paper-arxiv)
- [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude)


#### entity-k-kumar

*type: `entity` · sources: video · entity: person*

## Day 1 — video

# K. Kumar

## Profile

K. Kumar is described in the source as a **co-founder and student at the University of Edinburgh**. He created the visual mapping tool used in the video to extract decision trees, goals, and constraints from human–AI dialogue.

## Role in the Source

- Builder of the dialogue-extraction visualization tool central to the live demos
- Co-participant in the conversation (named in the speaker list)
- Intellectual collaborator on the thesis that conversation is the substrate of AI skills

## Attributed Contributions in This Vault

- The visual decision-tree mapping tool underpinning [concept-dialogue-structure](#concept-dialogue-structure)
- Methodological inspiration for [framework-skill-creation](#framework-skill-creation)

## Disambiguation Note

Multiple individuals match 'Kumar' at the University of Edinburgh. No definitive canonical URL is asserted without additional identifiers — downstream tools should treat the entity as project-specific.

## Related across days
- [concept-dialogue-structure](#concept-dialogue-structure)
- [entity-external-adopters](#entity-external-adopters)
- [synthesis-workspace-builder-is-the-meta-dialogue](#synthesis-workspace-builder-is-the-meta-dialogue)


#### entity-langchain

*type: `entity` · sources: video · entity: tool*

## Profile

LangChain is a popular open-source framework for building LLM applications, offering chains, agents, tools, retrievers, and integrations for both single- and multi-agent patterns.

## Role in the Source

Cited as the canonical example of a 'complex multi-agent framework' that [entity-jake-van-clief](#entity-jake-van-clief) argues is over-engineered relative to [concept-icm-d1](#concept-icm-d1). Forms the core of the contrarian position in [claim-icm-superiority](#claim-icm-superiority) and [contrarian-frameworks](#contrarian-frameworks).

## Balanced View

LangChain is genuinely useful for cross-boundary, multi-team, and large-scale workflows where role decomposition, tool routing, and orchestration glue earn their complexity. The video's critique is best read as 'most teams don't need it yet', not 'no one ever needs it.'


## Related across days
- [entity-autogen](#entity-autogen)
- [entity-semantic-kernel](#entity-semantic-kernel)
- [contrarian-frameworks](#contrarian-frameworks)
- [contrarian-frameworks-overkill](#contrarian-frameworks-overkill)
- [recurring-foil-frameworks](#recurring-foil-frameworks)


#### entity-remotion

*type: `entity` · sources: paper · entity: tool*

## What it is

A **React-based framework** for creating videos programmatically.

## Role in ICM

In ICM's first production workspace (script-to-animation), **stage 3 (`03_production`)** reads the finished script and produces:

- animation specifications,
- working Remotion code,

using design guidelines, color palettes, and animation conventions from `setup/` as Layer 3 reference (see [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) and [framework-icm-architecture](#framework-icm-architecture)).

It is the **target runtime** that turns markdown stage outputs into actual rendered video — the concrete example that grounds the abstract methodology in a real production pipeline.


## Related across days
- [framework-icm-architecture](#framework-icm-architecture)
- [concept-stage-contracts](#concept-stage-contracts)


#### entity-semantic-kernel

*type: `entity` · sources: video · entity: tool*

## Profile

Semantic Kernel is Microsoft's open-source orchestration framework for AI agents and LLM-driven applications. It exposes the abstractions of **skills**, **planners**, and **connectors** that bridge LLMs to external services.

## Role in the Source

Mentioned alongside [entity-langchain](#entity-langchain) as an orchestration framework that [entity-jake-van-clief](#entity-jake-van-clief)'s [concept-icm-d1](#concept-icm-d1) aims to replace with plain folder structures. Central to [claim-icm-superiority](#claim-icm-superiority) and [contrarian-frameworks](#contrarian-frameworks).

## Notable Irony

Semantic Kernel's 'skill' abstraction is conceptually close to ICM's 'skill' markdown file — both formalize a reusable unit of LLM capability. The disagreement is over the implementation substrate (code-defined plugins vs. plain markdown navigable by a single agent).


## Related across days
- [entity-langchain](#entity-langchain)
- [entity-autogen](#entity-autogen)
- [contrarian-frameworks](#contrarian-frameworks)
- [contrarian-frameworks-overkill](#contrarian-frameworks-overkill)
- [recurring-foil-frameworks](#recurring-foil-frameworks)
- [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract)


---

### Folder: quotes

#### quote-absurdities

*type: `quote` · sources: video*

> "They're not building multi-agentic frameworks and all these absurdities, they're building folders and markdown files on their computer and getting huge results from it."

— [entity-jake-van-clief](#entity-jake-van-clief), 00:00:16

## Why It Matters

The video's thesis in one sentence. It anchors [concept-icm-d1](#concept-icm-d1), [claim-icm-superiority](#claim-icm-superiority), and the contrarian framing in [contrarian-frameworks](#contrarian-frameworks).


## Related across days
- [contrarian-frameworks](#contrarian-frameworks)
- [contrarian-frameworks-overkill](#contrarian-frameworks-overkill)
- [tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope)
- [recurring-foil-frameworks](#recurring-foil-frameworks)


#### quote-dialogue-theme

*type: `quote` · sources: video*

> "All of these skills, all of these folders and markdown files, all have one core theme: discussion and dialogue."

— [entity-jake-van-clief](#entity-jake-van-clief), 00:07:09

## Why It Matters

The philosophical centre of the talk. It connects [concept-icm-d1](#concept-icm-d1) back to [concept-dialogue-structure](#concept-dialogue-structure) and [framework-skill-creation](#framework-skill-creation) — skills are conversational decision trees made permanent.


## Related across days
- [concept-dialogue-structure](#concept-dialogue-structure)
- [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering)
- [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue)


#### quote-edit-source

*type: `quote` · sources: paper*

## Quote

> *"Editing the output fixes this run. Editing the source fixes every future run."*
>
> — [entity-jake-van-clief](#entity-jake-van-clief)

## Why it matters

The crux of the [concept-edit-source-principle](#concept-edit-source-principle), framed in compiler terms:

- editing **output** is patching the binary,
- editing **source** that produced it is fixing the cause.

Connects to the multi-pass compilation analogy in [concept-icm-as-compilation](#concept-icm-as-compilation) and the open future direction [question-semantic-debugging](#question-semantic-debugging) (which would make source-tracing automatic instead of manual).


## Related across days
- [concept-edit-source-principle](#concept-edit-source-principle)
- [synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution)
- [question-semantic-debugging](#question-semantic-debugging)


#### quote-folder-controls-context

*type: `quote` · sources: paper*

## Quote

> *"The same model executes every stage; the folder structure controls what context it receives."*
>
> — [entity-jake-van-clief](#entity-jake-van-clief)

## Why it matters

The **thesis in one sentence**: there is no second model deciding routing and no orchestration code — the only thing that differs between stages is **which files the one agent reads**.

This is the core insight on which [concept-icm-d2](#concept-icm-d2), [framework-icm-architecture](#framework-icm-architecture), and [concept-stage-contracts](#concept-stage-contracts) all rest. The performance mechanism is unpacked in [concept-context-scoping](#concept-context-scoping).

## Important nuance

"Same model" should be read as "same model family / same orchestrating agent." The paper does use [entity-claude-code](#entity-claude-code)'s subagent delegation within stages — but the orchestration logic itself is filesystem, not framework.


## Related across days
- [concept-stage-contracts](#concept-stage-contracts)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [concept-icm-d2](#concept-icm-d2)
- [synthesis-single-agent-clarified](#synthesis-single-agent-clarified)


#### quote-glass-box

*type: `quote` · sources: paper*

## Quote

> *"It did not become transparent through the addition of an explanation layer. It was never opaque in the first place, because every artifact is a plain-text file that a human can read."*
>
> — [entity-jake-van-clief](#entity-jake-van-clief)

## Why it matters

ICM's interpretability is **inherent** (in Rudin's sense), **not post-hoc** — the system is a glass box because its intermediate state is plain text. The full argument is in [concept-observability-side-effect](#concept-observability-side-effect); the contrarian move it implies is [contrarian-observability-free](#contrarian-observability-free).

## Caveat

In formal governance contexts "glass-box AI" requires more than readable artifacts (provenance, traceability, safety controls). By that strict definition ICM is a partial glass box; [question-semantic-debugging](#question-semantic-debugging) would close the traceability gap.


## Related across days
- [concept-observability-side-effect](#concept-observability-side-effect)
- [contrarian-observability-free](#contrarian-observability-free)
- [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue)


#### quote-l2-roi

*type: `quote` · sources: video*

> "The jump from L1 to L2 is the highest ROI move."

— [entity-jake-van-clief](#entity-jake-van-clief), 00:02:29

## Why It Matters

Compresses [claim-l2-roi](#claim-l2-roi) into a memorable consultant heuristic. References [concept-three-levels-ai](#concept-three-levels-ai).


## Related across days
- [claim-l2-roi](#claim-l2-roi)
- [concept-three-levels-ai](#concept-three-levels-ai)
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)


#### quote-voice-control

*type: `quote` · sources: video*

> "What if I could sit inside of a group call and control someone else's Claude code or AI through my voice and immediately access all of that data that's locally on their computer?"

— [entity-jake-van-clief](#entity-jake-van-clief), 00:19:05

## Why It Matters

The motivating question for [concept-voice-collaboration](#concept-voice-collaboration) and the forward-looking [claim-voice-future](#claim-voice-future). Also implicitly raises [question-voice-security](#question-voice-security).


## Related across days
- [concept-voice-collaboration](#concept-voice-collaboration)
- [claim-voice-future](#claim-voice-future)
- [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)


---

### Folder: action-items

#### action-codify-voice

*type: `action-item` · sources: video*

## Action

Create a dedicated markdown file (e.g., `voice-and-tone.md`) that explicitly defines:

- Writing style and register
- Formatting constraints (headings, lists, emphasis)
- Teaching methodology / explanatory posture
- Prohibited words, clichés, or rhetorical moves
- Examples of 'good' and 'bad' outputs

Reference this file from your primary agent prompt so [entity-claude](#entity-claude) (or any agent operating in your [concept-icm-d1](#concept-icm-d1) vault) consistently matches your style without repetitive manual prompting.

## Expected Outcome

- Consistent AI outputs across all projects
- One place to evolve style — every downstream skill inherits
- Eliminates the 'I'll just tell it the style each time' tax

## Concrete Instance Of

[framework-skill-creation](#framework-skill-creation) — voice/tone files are a special case of a structured skill (Goal: 'write in our voice'; Constraints: the rules; Assumptions: the audience).


## Related across days
- [action-separate-l3-l4](#action-separate-l3-l4)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [framework-skill-creation](#framework-skill-creation)
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)


#### action-implement-folders

*type: `action-item` · sources: video*

## Action

Structure your AI agent's context, instructions, prompts, and 'skills' using **standard file system folders and markdown files** rather than adopting an orchestration framework such as [entity-langchain](#entity-langchain) or [entity-semantic-kernel](#entity-semantic-kernel).

## How

1. Create a vault/folder per project or domain
2. Place reusable skills as markdown files in a `skills/` subfolder (use [framework-skill-creation](#framework-skill-creation))
3. Keep voice/tone, constraints, and brand guidelines as named markdown files referenced from the main prompt
4. Give a single agent (e.g., [entity-claude](#entity-claude)) navigational access to the folder

## Expected Outcome

- 20–40% reduction in token usage (anecdotal — verify locally)
- Increased transparency and inspectability
- Easier maintenance because everything is plain text
- Lower barrier for non-engineers to participate

## Underlying Concept

[concept-icm-d1](#concept-icm-d1)

## Supporting Claim

[claim-icm-superiority](#claim-icm-superiority)


## Related across days
- [action-numbered-stage-folders](#action-numbered-stage-folders)
- [framework-icm-architecture](#framework-icm-architecture)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)


#### action-move-to-l2

*type: `action-item` · sources: video*

## Action

Audit your team's current AI usage. If they are primarily copy-pasting into chatbots ([concept-three-levels-ai](#concept-three-levels-ai) Level 1), invest effort into creating:

- **Shared prompt libraries** (versioned, owned, reviewed)
- **Brand-tone guides** as markdown (see [action-codify-voice](#action-codify-voice))
- **Structured markdown skills** built using [framework-skill-creation](#framework-skill-creation)

This moves the team to Level 2 — the highest-ROI step per [claim-l2-roi](#claim-l2-roi) and [quote-l2-roi](#quote-l2-roi).

## Expected Outcome

- Flattens the effort curve while raising quality and consistency
- Creates the foundation for an eventual Level 3 transition
- Reduces variance across team members

## Caveats

ROI claims are practitioner-grounded, not formally measured. Establish your own baseline and re-measure post-intervention if you want hard numbers.


## Related across days
- [concept-three-levels-ai](#concept-three-levels-ai)
- [action-codify-voice](#action-codify-voice)
- [action-numbered-stage-folders](#action-numbered-stage-folders)
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)


#### action-numbered-stage-folders

*type: `action-item` · sources: paper*

## Action

Model each pipeline stage as a **numbered folder** with a `CONTEXT.md` contract and an `output/` folder.

## How

Use numbered folders (`01_`, `02_`, `03_`) so ordering is **explicit and reorderable by rename**. Each stage folder carries its own:

- `CONTEXT.md` (Layer 2 contract — see [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)),
- `references/` (Layer 3),
- `output/` (Layer 4).

The canonical layout is in [framework-icm-architecture](#framework-icm-architecture).

## Outcome

Stage order, addition, and removal become **filesystem operations**; one agent runs the pipeline with no orchestration code. The agent's per-stage behaviour is then driven entirely by [concept-stage-contracts](#concept-stage-contracts).


## Related across days
- [action-implement-folders](#action-implement-folders)
- [framework-icm-architecture](#framework-icm-architecture)
- [concept-stage-contracts](#concept-stage-contracts)
- [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline)


#### action-review-gates

*type: `action-item` · sources: paper*

## Action

**Pause after each stage writes its `output/`** so a human can inspect and edit before the next stage reads it.

## Why

Surfacing intermediate artifacts as **editable markdown** lets humans course-correct at the point where correction is cheapest — e.g. fixing a structural plan **before** any slides are drafted. The contract-level reasoning is in [concept-stage-contracts](#concept-stage-contracts).

## Expected pattern

Editing follows a **U-shape**: heavy at stage 1 (direction-setting), light in the middle, heavy at the final stage (alignment). The data behind this is [claim-ushaped-intervention](#claim-ushaped-intervention).

## Outcome

- Errors are caught where correction is cheapest.
- The human sets direction early and aligns output late.
- Each gate is also an opportunity to **edit source rather than output** when the issue is systemic — see [concept-edit-source-principle](#concept-edit-source-principle).


## Related across days
- [concept-stage-contracts](#concept-stage-contracts)
- [claim-ushaped-intervention](#claim-ushaped-intervention)
- [synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution)
- [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)


#### action-separate-l3-l4

*type: `action-item` · sources: paper*

## Action

Put **stable rules** (voice, design system, conventions) in `references/`;
put **per-run content** in `output/`.

## Why

The agent needs to know what to **internalize as constraints** (Layer 3) versus what to **process as input** (Layer 4). See [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) for the full layer model.

## The metaphor

- The **recipe** (Layer 3) stays fixed across runs.
- The **ingredients** (Layer 4) change each run.

Misclassifying the two is the failure mode the layering exists to prevent — if a voice guide ends up in `output/`, the agent treats stable rules as transient material; if this run's brief ends up in `references/`, it persists into future runs.

## Outcome

Reference persists across runs; per-run state is contained and reproducible — strengthening the [concept-portability](#concept-portability) story.


## Related across days
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [action-codify-voice](#action-codify-voice)
- [synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap)


---

### Folder: prerequisites

#### prereq-llm-context-windows

*type: `prereq` · sources: paper*

## Why this is a prerequisite

ICM's **efficiency argument** rests on keeping relevant tokens scoped and **out of the degraded middle** of a long context window.

## Key source

**Liu et al., "Lost in the Middle: How Language Models Use Long Contexts."**

Finding: models often perform *worst* on information placed in the middle of long inputs; careful placement and scoping of relevant content can help.

## What this supports

- The token-efficiency story in [claim-token-efficiency](#claim-token-efficiency) (representative ~5k focused vs ~42k monolithic).
- The behavioural mechanism in [concept-context-scoping](#concept-context-scoping) (same model, different available info → different task).
- The layering design in [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) (relevance density via routing/content separation).

## Validation note

Liu et al.'s finding is **strong external support for the qualitative argument** (scoped contexts should outperform sprawling ones). It does **not** demonstrate the specific numerical advantage ICM cites — that requires a controlled comparison, the subject of [question-controlled-comparison](#question-controlled-comparison).


## Related across days
- [prereq-llm-context](#prereq-llm-context)
- [claim-token-efficiency](#claim-token-efficiency)
- [concept-context-scoping](#concept-context-scoping)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)


#### prereq-llm-context

*type: `prereq` · sources: video*

## Why It's Required

To fully grasp why the folder-based [concept-icm-d1](#concept-icm-d1) is efficient, you must understand:

- **How LLMs process tokens** — text is chunked into tokens before inference
- **Context window limits** — every model has a maximum context size
- **Cost scaling** — most APIs price per input + output token, so context bloat directly costs money
- **Attention degradation** — practical performance often degrades as context grows ('lost in the middle')

## Connection to ICM

ICM's central efficiency argument is that **on-demand folder navigation loads only relevant slices into context**, instead of stuffing everything into the prompt. This is consistent with general LLM guidance (externalize persistent state, load only what is needed).

The claimed 20–40% token reduction in [claim-icm-superiority](#claim-icm-superiority) is a direct consequence of this design choice.


## Related across days
- [prereq-llm-context-windows](#prereq-llm-context-windows)
- [claim-token-efficiency](#claim-token-efficiency)
- [concept-context-scoping](#concept-context-scoping)


#### prereq-markdown

*type: `prereq` · sources: video*

## Why It's Required

The entire [concept-icm-d1](#concept-icm-d1) methodology relies on **Markdown** as the substrate for instructions, context, and skills. You should be comfortable with at least:

- Headings (`#`, `##`, `###`)
- Bullet and numbered lists
- Bold and italic emphasis
- Code fences and inline code
- Links and image syntax
- Optionally: front-matter (YAML at the top of files), tables, and footnotes

## Why Markdown Specifically

- Plain-text → version-controllable and diff-friendly
- Human-readable → low barrier for non-engineers
- Machine-parseable → LLMs handle markdown structure exceptionally well
- Portable → works in Obsidian, GitHub, IDEs, and editor of choice

Markdown is the lingua franca that makes [framework-skill-creation](#framework-skill-creation) and [action-codify-voice](#action-codify-voice) possible without code.


## Related across days
- [prereq-unix-pipelines](#prereq-unix-pipelines)
- [concept-icm-d1](#concept-icm-d1)
- [framework-icm-architecture](#framework-icm-architecture)


#### prereq-unix-pipelines

*type: `prereq` · sources: paper*

## Why this is a prerequisite

ICM is the **Unix philosophy applied to AI agents**:

- *Do one thing well* (McIlroy),
- *Plain text as universal interface*,
- *Human-readable intermediate state*.

## Key sources

- **McIlroy** — "Do one thing well."
- **Shaw & Garlan** — the pipe-and-filter architectural pattern.

## What understanding this enables

The principles that made Unix pipelines effective are the *same* principles that make ICM work for sequential agent workflows. Stages are filters; folders are pipes; markdown files are the plain-text universal interface.

This prerequisite also feeds the more elaborated theoretical analogy in [concept-icm-as-compilation](#concept-icm-as-compilation) (multi-pass compilation), which extends pipe-and-filter to incremental re-runs and intermediate-representation inspection.


## Related across days
- [prereq-markdown](#prereq-markdown)
- [concept-icm-as-compilation](#concept-icm-as-compilation)
- [concept-icm-d2](#concept-icm-d2)


---

### Folder: open-questions

#### question-controlled-comparison

*type: `open-question` · sources: paper*

## The question

Is ICM's staged context loading **measurably better** than monolithic prompting on the same tasks?

## Current state

**No controlled comparison has been conducted.** The quality claim rests on:

- "Lost in the middle" theory ([prereq-llm-context-windows](#prereq-llm-context-windows)),
- self-reported practitioner experience from an **invite-only, self-selected community** of 33 ([claim-ushaped-intervention](#claim-ushaped-intervention)),
- representative — not measured — token counts ([claim-token-efficiency](#claim-token-efficiency)).

## Resolution path

A controlled comparison of ICM staged context loading vs monolithic prompting on the **same tasks**, with **instrumented measurement**:

- output quality (human-rated and automated metrics),
- editing burden,
- error rate,
- time-to-completion.

## Why it matters

Moves ICM's efficiency and quality claims from **directional to demonstrated**. The Stanford HAI guidance on validating AI claims explicitly flags this kind of baseline comparison as essential — and the paper, to its credit, acknowledges the gap rather than hiding it.


## Related across days
- [claim-token-efficiency](#claim-token-efficiency)
- [claim-icm-superiority](#claim-icm-superiority)
- [question-icm-scaling](#question-icm-scaling)
- [arc-evidence-base-evolution](#arc-evidence-base-evolution)
- [open-arc-what-remains](#open-arc-what-remains)


#### question-cross-model

*type: `open-question` · sources: paper*

## The question

Does the [concept-five-layer-hierarchy](#concept-five-layer-hierarchy) hold outside the Claude family?

All current testing used a single family — **Claude Opus 4.6 + Sonnet 4.6**, via [entity-claude-code](#entity-claude-code).

## Why it matters

Output quality may vary with other models, particularly those with **different context-handling characteristics**:

- different attention patterns,
- different sensitivity to instruction placement,
- different effective context lengths,
- different behaviour on long markdown blocks.

Whether the five-layer hierarchy is robust — or fitted to Claude's specific characteristics — is **untested**.

## Resolution path

**Formal cross-model evaluation** of the same ICM workspaces on GPT-4 class, Gemini, Llama 3, etc., measuring stage-level output quality and intervention burden.

Named in the paper as the natural next step alongside [question-controlled-comparison](#question-controlled-comparison).


## Related across days
- [entity-claude-code](#entity-claude-code)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [open-arc-what-remains](#open-arc-what-remains)


#### question-icm-scaling

*type: `open-question` · sources: video*

## The Question

While [entity-jake-van-clief](#entity-jake-van-clief) demonstrates [concept-icm-d1](#concept-icm-d1) working effectively on focused projects and bounded databases, it remains an open question how well a **single agent navigating a folder structure scales** when applied to massive, legacy enterprise codebases with tens of thousands of interconnected files.

## Why It Matters

If ICM degrades at enterprise scale, the contrarian critique in [contrarian-frameworks](#contrarian-frameworks) weakens — because multi-agent orchestration frameworks (with specialized retrieval, planning, and validation agents) were *designed* for exactly that scale.

## Resolution Path

- Case studies of ICM applied to a monolithic enterprise codebase
- Benchmarks comparing single-agent ICM navigation vs framework-based approaches on (a) accuracy of file selection, (b) time-to-answer, (c) refactor correctness
- Hybrid patterns where ICM provides the substrate but a thin orchestration layer handles cross-team boundaries

## Sub-Threads

- At what file count / repository complexity does single-agent navigation break down?
- Do hierarchical 'index' markdown files mitigate the problem?
- Does Claude's improving context window absorb the scaling problem on its own over time?

## Open Questions Stated in the Companion Paper

The paper [entity-icm-paper-arxiv](#entity-icm-paper-arxiv) independently flags adjacent unknowns and explicitly bounds ICM's claims — these are *author-acknowledged* gaps, not external critique:

- **Cross-model generalization** — does the Five-Layer Context Hierarchy hold outside the Claude family? *All* paper testing used a single model family (Claude Opus/Sonnet 4.6).
- **Diminishing returns of selective loading** — as context windows grow, does staged loading stay worthwhile, or does the scaling problem dissolve on its own (directly echoes the sub-thread above)?
- **Sensitivity to ordering/formatting** — how much does output quality depend on context ordering within a layer?
- **Explicit non-support** — the paper states ICM is *not* intended for real-time multi-agent collaboration, high-concurrency systems, or complex automated branching. This narrows the scaling question: ICM's authors concede the high-concurrency enterprise case to frameworks rather than claiming to scale into it.
- **Methodological weakness** — evidence is self-reported (not instrumented), from an invite-only, self-selected community (52 members) concentrated in content production; no controlled comparison exists. Resolving the scaling question therefore requires the *formal cross-model evaluation and structured user studies* the paper itself calls for.


## Related across days
- [question-controlled-comparison](#question-controlled-comparison)
- [question-cross-model](#question-cross-model)
- [open-arc-what-remains](#open-arc-what-remains)


#### question-semantic-debugging

*type: `open-question` · sources: paper*

## The question

ICM today offers **observability** (read any output) but **not traceability**: if a stage-3 phrase sounds wrong, there is **no direct way to trace it to its cause** (which reference file? which contract? which upstream output?). The practitioner does it manually.

## Resolution path

Build **source maps** connecting an output phrase back to:

- the specific instruction in a stage `CONTEXT.md`,
- the reference file (Layer 3) that produced it,
- the prior-stage artifact (Layer 4) it transformed.

In other words: **semantic debugging** — breakpoints, stack-traces, and source-map equivalents for *content* pipelines, paralleling compiler debug builds.

## Why it matters

This is the **machinery the [concept-edit-source-principle](#concept-edit-source-principle) needs** to be practical: editing source instead of output requires knowing which source. Closes ICM's biggest "glass-box" gap by strict governance definitions — see [concept-observability-side-effect](#concept-observability-side-effect) and [contrarian-observability-free](#contrarian-observability-free).

Named in the paper as future work; would also strengthen any cross-model results from [question-cross-model](#question-cross-model).


## Related across days
- [concept-edit-source-principle](#concept-edit-source-principle)
- [concept-observability-side-effect](#concept-observability-side-effect)
- [quote-edit-source](#quote-edit-source)
- [open-arc-what-remains](#open-arc-what-remains)


#### question-voice-security

*type: `open-question` · sources: video*

## The Question

The [concept-voice-collaboration](#concept-voice-collaboration) demonstration shows an AI agent taking voice commands during a live call and executing **read/write operations on a local file system**. This raises significant security questions:

- **Authentication**: how does the agent know the speaker is authorized?
- **Voice spoofing**: tools like [entity-11labs](#entity-11labs) make cloning trivial; replay and synthesized-voice attacks are realistic
- **Permission scoping**: which folders/files can the voice channel touch?
- **Bystander hijacking**: in a compromised or public call, anyone with mic access could issue commands
- **Audit trail**: how are voice-issued operations logged?

## Why It Matters

Without solid answers, the prediction in [claim-voice-future](#claim-voice-future) cannot transition from demo to production deployment in any regulated or sensitive environment.

## Resolution Path

- Robust voice authentication protocols (biometric + secondary factor)
- Sandboxed execution environments for voice-driven agents (capability-scoped, read-only by default)
- Command-confirmation patterns for destructive operations
- Cryptographic provenance for voice commands
- Policy frameworks at the OS / enterprise IT level


## Related across days
- [concept-voice-collaboration](#concept-voice-collaboration)
- [claim-voice-future](#claim-voice-future)
- [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)
- [open-arc-what-remains](#open-arc-what-remains)


---

### Folder: contrarian-insights

#### contrarian-frameworks-overkill

*type: `contrarian-insight` · sources: paper*

## What this challenges

The default assumption that agentic workflows *require* a multi-agent orchestration framework — LangChain, [entity-autogen](#entity-autogen), Semantic Kernel, and similar.

## The argument

For sequential workflows where a human reviews each step, frameworks introduce engineering overhead the problem does not require. The token-efficiency story ([claim-token-efficiency](#claim-token-efficiency)) and observability story ([concept-observability-side-effect](#concept-observability-side-effect)) both come for free from the filesystem layout; the message-passing, branching, and coordination infrastructure of a framework adds cost without benefit for this workflow class.

## Bounded, not absolute

ICM **concedes the cases frameworks are built for**:

- real-time multi-agent collaboration,
- high-concurrency,
- programmatic branching,
- automated error recovery.

It claims only the *sequential / reviewable / repeatable* class. "Frameworks are absurd" overshoots; "frameworks are overhead **for this class**" is the defensible version.

## Validation status

The mechanical substitution (folders instead of framework orchestration) is technically sound and consistent with traditional software architecture patterns (Unix pipes, multi-pass compilation) — see [prereq-unix-pipelines](#prereq-unix-pipelines) and [concept-icm-as-compilation](#concept-icm-as-compilation). The judgment that frameworks are *overkill* for this class is **defensible but value-laden and not experimentally demonstrated** — there is no published controlled comparison; that gap is [question-controlled-comparison](#question-controlled-comparison).

## Counter-perspective

A critical reviewer might counter: for nontrivial workflows (branching logic, conditional steps, parallelism), the complexity may just be *shifted* from explicit code into conventions and human discipline. Orchestration tools (Airflow, Temporal, Prefect) earn their keep once flows become nontrivial. ICM is likely best suited to *linear or mildly branching workflows with strong human oversight*; beyond that, conventional orchestration retains advantages in correctness, monitoring, and scalability.


## Related across days
- [contrarian-frameworks](#contrarian-frameworks)
- [tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope)
- [recurring-foil-frameworks](#recurring-foil-frameworks)


#### contrarian-frameworks

*type: `contrarian-insight` · sources: video*

## The Contrarian Position

Contrary to the industry trend of building increasingly complex multi-agent orchestration frameworks (such as [entity-langchain](#entity-langchain), [entity-semantic-kernel](#entity-semantic-kernel), or AutoGen), [entity-jake-van-clief](#entity-jake-van-clief) argues that these are 'absurdities.' He posits that a simpler approach — standard file-system folders and markdown files navigated by a single agent — is more effective, cheaper, and easier to maintain.

See the matching claim [claim-icm-superiority](#claim-icm-superiority) and the headline [quote-absurdities](#quote-absurdities).

## What the Position Challenges

The prevailing belief that complex tasks demand complex orchestration. The contrarian view says: **most tasks don't, and the orchestration tax is paid in tokens, debugging time, and adoption friction.**

## How Far the Contrarian Claim Holds

- **Supported by the literature**: starting with a single agent plus structured context is mainstream advice (e.g., Microsoft's Cloud Adoption Framework). For coding tasks especially, single-agent + tools fits most workloads.
- **Partially supported / anecdotal**: the specific 20–40% token-reduction figure is plausible but not benchmarked.
- **Not supported in absolute form**: multi-agent frameworks are well-motivated when crossing security/compliance boundaries, when multiple teams own subsystems, when you need specialized roles with distinct permissions, or when distributed-systems patterns (sagas, circuit breakers, immutable state) become necessary at scale.

## Balanced Reframe

ICM is the right *starting* architecture and the right *terminal* architecture for many single-team workflows. Multi-agent frameworks are not absurd; they encode known distributed-systems patterns and earn their complexity once a system genuinely crosses boundaries.


## Related across days
- [contrarian-frameworks-overkill](#contrarian-frameworks-overkill)
- [tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope)
- [recurring-foil-frameworks](#recurring-foil-frameworks)
- [quote-absurdities](#quote-absurdities)


#### contrarian-observability-free

*type: `contrarian-insight` · sources: paper*

## What this challenges

The assumption that inspecting an AI pipeline's intermediate state requires building logging, dashboards, or tracing tooling on top of the system.

## The argument

If every intermediate artifact is a plain file, **observability is free**: you open a folder and read it. The system is a glass box **by construction** rather than by the addition of an explanation layer — inverting the usual build-observability-on-top posture. See [concept-observability-side-effect](#concept-observability-side-effect) and the canonical quote [quote-glass-box](#quote-glass-box).

## Validation status

The observability-via-plain-artifacts claim is correct and aligns with modern observability guidance that systems should be *designed* so behaviour data are captured as part of normal operation.

However, the enrichment overlay flags that in regulated or safety-critical settings, "glass-box AI" requires more than readable artifacts:

- **Provenance** (model version, configuration).
- **Fine-grained traceability** (output spans → source snippets).
- **Explicit safety controls and audit trails.**

ICM provides observability strongly; traceability and formal safety controls are flagged as future work — see [question-semantic-debugging](#question-semantic-debugging). By colloquial usage ICM is a glass box; in formal governance contexts it is **incomplete**.


## Related across days
- [concept-observability-side-effect](#concept-observability-side-effect)
- [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue)
- [quote-glass-box](#quote-glass-box)


---

### Folder: cross-day

#### arc-evidence-base-evolution

*type: `synthesis` · sources: cross-day*

A reader moving from the talk to the paper sees the **same factual base** acquire **scaffolding**.

## Video evidence (largely anecdotal)

- "20–40% token reduction" — no methodology given.
- "Highest-ROI move" — consultant heuristic ([claim-l2-roi](#claim-l2-roi)).
- "Karpathy does this too" — [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) cited as social proof.
- Voice demo — single live demonstration.

## Paper evidence (anecdotal + lineage + limitations)

The paper retains the anecdotal grounding **and** adds:

- **Theoretical lineage**: Unix pipelines ([prereq-unix-pipelines](#prereq-unix-pipelines)), multi-pass compilation ([concept-icm-as-compilation](#concept-icm-as-compilation)), Karpathy context engineering ([entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2)), Rudin inherent interpretability ([concept-observability-side-effect](#concept-observability-side-effect)), Amershi mixed-initiative.
- **Quantitative grounding**: 2–8k vs 42k tokens, justified via Liu et al. "lost in the middle" ([prereq-llm-context-windows](#prereq-llm-context-windows)). See [claim-token-efficiency](#claim-token-efficiency).
- **Empirical signal**: n=33 practitioners, 30 reporting U-shape ([claim-ushaped-intervention](#claim-ushaped-intervention)).
- **External adoption**: Edinburgh Neuropolitics Lab, ICR Research, Academy of International Affairs Bonn ([claim-external-adoption](#claim-external-adoption), [entity-external-adopters](#entity-external-adopters)).
- **Explicit limits**: no controlled comparison ([question-controlled-comparison](#question-controlled-comparison)), single model family ([question-cross-model](#question-cross-model)), observability ≠ traceability ([question-semantic-debugging](#question-semantic-debugging)).

## What the upgrade demonstrates

The arc is the **maturation of a consultant heuristic into a research program**. The claims that survive: ICM's architectural shape and its observability story. The claims that need testing: the token efficiency numbers and the productivity gains. The claims explicitly out of scope: real-time voice ([tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support)).

For downstream questions, **always cite which evidence layer supports the claim** — lineage, representative numbers, self-reported practitioner data, or external adoption — rather than asserting a single confidence level.

See [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude), [open-arc-what-remains](#open-arc-what-remains).


#### arc-talk-vs-paper-altitude

*type: `synthesis` · sources: cross-day*

This corpus contains the same idea told twice by [entity-jake-van-clief](#entity-jake-van-clief) and [entity-david-mcdermott](#entity-david-mcdermott):

- The **video** ([entity-icm-paper-arxiv](#entity-icm-paper-arxiv) context only — the talk itself is sourced from the YouTube transcript) is **practitioner-conviction** mode. Rhetoric is sharp ("absurdities" — [quote-absurdities](#quote-absurdities)), claims are bold ("highest-ROI move" — [quote-l2-roi](#quote-l2-roi)), and the finale is a forward-looking vision ([concept-voice-collaboration](#concept-voice-collaboration)).
- The **paper** ([entity-icm-paper](#entity-icm-paper)) is **researcher-honesty** mode. The same thesis is expressed as a bounded architectural claim ([concept-icm-d2](#concept-icm-d2)), grounded in lineage ([prereq-unix-pipelines](#prereq-unix-pipelines), [concept-icm-as-compilation](#concept-icm-as-compilation)), and surrounded by **explicit limitations** ([question-controlled-comparison](#question-controlled-comparison), [question-cross-model](#question-cross-model), [question-semantic-debugging](#question-semantic-debugging)).

## The translation table

| Video framing | Paper framing |
|---|---|
| "folders and markdown" ([concept-icm-d1](#concept-icm-d1)) | five-layer context hierarchy ([concept-five-layer-hierarchy](#concept-five-layer-hierarchy)) |
| "skill" (markdown file) | stage contract ([concept-stage-contracts](#concept-stage-contracts)) |
| "single agent" ([claim-icm-superiority](#claim-icm-superiority)) | single orchestrator + subagents ([entity-claude-code](#entity-claude-code)) |
| "20–40% token reduction" (anecdotal) | ~2–8k vs ~42k representative ([claim-token-efficiency](#claim-token-efficiency)) |
| "absurdities" ([contrarian-frameworks](#contrarian-frameworks)) | "overhead for *this class*" ([contrarian-frameworks-overkill](#contrarian-frameworks-overkill)) |
| dialogue is the core theme ([concept-dialogue-structure](#concept-dialogue-structure)) | context engineering / scoping ([concept-context-scoping](#concept-context-scoping)) |

## Why this matters for the reader

The video sells; the paper survives peer review. When answering questions, **default to the paper's altitude** — the video supplies conviction and vision, the paper supplies the testable claims and the boundaries.

See also [tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope), [arc-evidence-base-evolution](#arc-evidence-base-evolution).


#### open-arc-what-remains

*type: `synthesis` · sources: cross-day*

Questions that neither the talk nor the paper resolves.

## From the video

- **[question-icm-scaling](#question-icm-scaling)** — does single-agent folder navigation handle massive legacy codebases?
- **[question-voice-security](#question-voice-security)** — how is the voice-driven file-system access loop secured against bystander hijack, voice cloning attacks, and command-confirmation gaps?

## From the paper

- **[question-controlled-comparison](#question-controlled-comparison)** — is staged loading measurably better than monolithic prompting? No head-to-head was run.
- **[question-cross-model](#question-cross-model)** — does the five-layer hierarchy hold outside Claude Opus / Sonnet 4.6?
- **[question-semantic-debugging](#question-semantic-debugging)** — can ICM provide automatic traceability (output spans → source snippets) and not just observability?

## Cross-source questions only this synthesis raises

- **Does the voice finale survive when the paper's review-gate discipline is removed?** The talk imagines real-time voice control; the paper's whole epistemics depend on review gates ([action-review-gates](#action-review-gates)). The two need reconciliation. See [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support).
- **What is the right boundary between L3 references and L4 working artifacts in long-running workspaces?** As pipelines run repeatedly, today's L4 may become tomorrow's L3 reference. The corpus doesn't address the promotion path.
- **Can the workspace-builder bootstrap a multi-agent ICM?** If ICM can build ICM ([framework-workspace-builder](#framework-workspace-builder)), can it build something the paper excludes — a real-time multi-agent system whose folder substrate is shared at runtime? This is the unstated bridge to the voice future.
- **What is the right unit of governance?** Each stage outputs a plain file ([concept-observability-side-effect](#concept-observability-side-effect)), but **provenance** across runs and **policy enforcement** across users are not addressed. Strict glass-box governance (audit trails, traceability) remains unbuilt.

See [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude), [arc-evidence-base-evolution](#arc-evidence-base-evolution).


#### recurring-foil-frameworks

*type: `synthesis` · sources: cross-day*

Both sources need an opponent to argue against. The cast:

- **[entity-langchain](#entity-langchain)** — appears in both. The canonical orchestration foil.
- **[entity-semantic-kernel](#entity-semantic-kernel)** — appears in both. Microsoft's competitor; ironic because it also uses the term "skill" (different sense).
- **[entity-autogen](#entity-autogen)** — appears only in the paper. Microsoft's multi-agent conversation framework.

## Treatment evolution

**Video** ([contrarian-frameworks](#contrarian-frameworks), [quote-absurdities](#quote-absurdities)): blanket dismissal as "absurdities." Rhetorical.

**Paper** ([contrarian-frameworks-overkill](#contrarian-frameworks-overkill)): bounded claim — overhead for sequential human-reviewed workflows; **recommended** for the use cases ICM excludes (real-time multi-agent, high concurrency). Analytic.

## The scope rule

The paper effectively divides the design space:

| Use case | Recommended |
|---|---|
| Sequential pipeline, human review at each step | ICM ([concept-icm-d2](#concept-icm-d2)) |
| Real-time multi-agent collaboration | AutoGen / LangChain |
| Heavy programmatic branching, automated error recovery | Airflow / Temporal / Prefect |
| Regulated context with formal traceability requirements | Frameworks with audit tooling |
| High concurrency / high throughput | Specialized agent frameworks |

## Why this matters

A reader who only watches the video will think frameworks are bad. A reader who only reads the paper sees a respectful design-space partition. The corpus position is the latter; the video's rhetoric should be treated as **provocation**, not the considered claim.

See [tension-absurdities-vs-bounded-scope](#tension-absurdities-vs-bounded-scope), [tension-voice-future-vs-paper-non-support](#tension-voice-future-vs-paper-non-support).


#### synthesis-dialogue-to-context-engineering

*type: `synthesis` · sources: cross-day*

The video's philosophical centre is [concept-dialogue-structure](#concept-dialogue-structure): every effective AI workflow descends from a successful human–AI conversation. [quote-dialogue-theme](#quote-dialogue-theme) is the banner.

The paper relocates the same intuition into the academic frame of **context engineering** ([concept-context-scoping](#concept-context-scoping)), citing [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2) (2025): system behaviour depends on what context is delivered, in what structure, and at what moment.

## The bridge

Karpathy appears in **both** sources — see [entity-andrej-karpathy-d1](#entity-andrej-karpathy-d1) and [entity-andrej-karpathy-d2](#entity-andrej-karpathy-d2). In the video he is cited for the **"LLM Wiki"** markdown approach (independent validation that practitioners are choosing folders + text over frameworks). In the paper he supplies the **theoretical scaffolding** that justifies why context scoping changes the task the same model performs.

## The synthesis

Dialogue ↔ context engineering are the same insight at different altitudes:

- **Dialogue** (video frame): humans extract structure from their successful conversations and codify it.
- **Context engineering** (paper frame): humans deliver scoped, relevant context to a model at the right moment.

Both refuse the framework layer. Both treat the **content of the prompt** as where the engineering lives. K. Kumar's visual mapping tool ([entity-k-kumar](#entity-k-kumar)) is the prototype for surfacing latent dialogue structure — a tool that, in the paper's frame, would be classified as a **workspace authoring aid**.

See [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract), [arc-evidence-base-evolution](#arc-evidence-base-evolution).


#### synthesis-edit-source-as-dialogue-evolution

*type: `synthesis` · sources: cross-day*

The video frames the creation step: extract dialogue → encode as skill ([framework-skill-creation](#framework-skill-creation), [action-codify-voice](#action-codify-voice)). The paper frames the **maintenance** step: [concept-edit-source-principle](#concept-edit-source-principle) / [quote-edit-source](#quote-edit-source):

> *"Editing the output fixes this run. Editing the source fixes every future run."*

## The full loop

Together the two sources describe a complete lifecycle:

1. **Author** — extract latent dialogue structure into markdown ([framework-skill-creation](#framework-skill-creation)).
2. **Run** — single orchestrator traverses folders ([concept-icm-d2](#concept-icm-d2), [framework-icm-architecture](#framework-icm-architecture)).
3. **Review** — human edits at stage gates ([action-review-gates](#action-review-gates), [claim-ushaped-intervention](#claim-ushaped-intervention)).
4. **Diagnose** — if output is wrong, trace back to which layer is wrong: L2 contract? L3 reference? L1 routing? ([concept-five-layer-hierarchy](#concept-five-layer-hierarchy))
5. **Repair** — edit the source layer ([concept-edit-source-principle](#concept-edit-source-principle)). Future runs inherit the fix.

## What the paper concedes

Doing step 4 systematically — automated trace from a wrong phrase to its causing source — is **future work** ([question-semantic-debugging](#question-semantic-debugging)). Right now it's a manual discipline.

## What the video implies but doesn't say

The edit-source principle is the natural successor to dialogue extraction. The first chat is the proto-source; subsequent edits refine the source. The skill is a **living document**, not a snapshot.

See [synthesis-glass-box-meets-dialogue](#synthesis-glass-box-meets-dialogue), [arc-evidence-base-evolution](#arc-evidence-base-evolution).


#### synthesis-five-layer-fills-the-gap

*type: `synthesis` · sources: cross-day*

The video ([concept-icm-d1](#concept-icm-d1)) gestures at folder structure but never names the layers. The paper supplies the explicit primitive — [concept-five-layer-hierarchy](#concept-five-layer-hierarchy):

- **L0** `CLAUDE.md` — global identity (~800 tok)
- **L1** `CONTEXT.md` (root) — workspace routing (~300 tok)
- **L2** `CONTEXT.md` (stage) — stage contract (200–500 tok)
- **L3** `references/` — stable rules (500–2k tok)
- **L4** `output/` — per-run artifacts

## What this resolves

When the video says "build a `voice-and-tone.md`" ([action-codify-voice](#action-codify-voice)), the paper tells you **where it goes**: Layer 3, `references/`. When the video says "write a skill," the paper tells you the skill is the **Layer 2 stage contract** ([concept-stage-contracts](#concept-stage-contracts)) plus its `references/` recipe and `output/` ingredients ([action-separate-l3-l4](#action-separate-l3-l4)).

## What this enables

The canonical workspace layout ([framework-icm-architecture](#framework-icm-architecture)) becomes legible. The video's [framework-skill-creation](#framework-skill-creation) (Goal → Constraints → Assumptions → Sub-goals → Markdown) can now be assigned to layers: Goal → L2, Constraints → L3, Assumptions → L3 or L1, Sub-goals → ordered stage folders ([action-numbered-stage-folders](#action-numbered-stage-folders)).

The **architectural distinction the talk lacks** is L3 (recipe / factory / constraints to internalize) vs L4 (ingredients / product / input to transform). Misclassifying these is the most common failure mode — see [action-separate-l3-l4](#action-separate-l3-l4).

See [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract).


#### synthesis-glass-box-meets-dialogue

*type: `synthesis` · sources: cross-day*

The video centres dialogue ([concept-dialogue-structure](#concept-dialogue-structure), [quote-dialogue-theme](#quote-dialogue-theme)) but never explains why folder/markdown is *better* than embedding the dialogue inside an orchestration framework. The paper supplies the answer — [concept-observability-side-effect](#concept-observability-side-effect) and [quote-glass-box](#quote-glass-box):

> *"It did not become transparent through the addition of an explanation layer. It was never opaque in the first place, because every artifact is a plain-text file that a human can read."*

## The connection

If the workflow IS dialogue (video), and dialogue is a plain-text artifact (paper), then the **inspectability of the workflow is a property of the substrate**. No logging layer is required — see [contrarian-observability-free](#contrarian-observability-free). The dialogue theme and the glass-box theme are the same claim expressed once as philosophy (video) and once as architectural consequence (paper).

## What the paper adds that the talk doesn't

A distinction between **observability** (read any file) and **traceability** (output spans → source snippets). ICM has the first; the second is open work — [question-semantic-debugging](#question-semantic-debugging). Strict regulatory definitions of "glass-box" (provenance, fine-grained traceability, audit trails) are not yet met.

## What the talk adds that the paper doesn't

The **psychological** reason humans accept it: dialogue is the natural unit of human cognition. A folder of conversations feels native; a graph of agents does not.

See [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering), [synthesis-edit-source-as-dialogue-evolution](#synthesis-edit-source-as-dialogue-evolution).


#### synthesis-single-agent-clarified

*type: `synthesis` · sources: cross-day*

The video ([claim-icm-superiority](#claim-icm-superiority), [concept-icm-d1](#concept-icm-d1)) repeatedly says **single agent**. The paper sharpens this — see [entity-claude-code](#entity-claude-code):

> All ICM testing used Opus 4.6 as primary orchestrator + Sonnet 4.6 as subagent workers. Within a stage, Opus delegates sub-tasks to faster Sonnet subagents — and the delegation is itself folder-driven (the agent reads `CONTEXT.md` to decide what to delegate).

## The precise claim

ICM is **single-ORCHESTRATOR with folder-driven subagent delegation**, not strictly single-agent in execution. The paper's exact claim is **"no orchestration framework,"** not **"no second model."**

## Why this matters

The claim survives the nuance: the **engineering** complexity ICM avoids is the framework layer (LangChain graphs, AutoGen routing, Semantic Kernel planners — see [entity-langchain](#entity-langchain), [entity-autogen](#entity-autogen), [entity-semantic-kernel](#entity-semantic-kernel)). Whether one model or two run within a stage is an internal implementation detail driven by [entity-claude-code](#entity-claude-code)'s own subagent feature.

## Communication discipline

When summarising ICM to a sceptic, **say "single orchestrator, no orchestration framework"** rather than "single agent." The skeptic who finds out about Sonnet subagents will accuse the source of overselling otherwise.

See [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude), [entity-claude](#entity-claude), [entity-claude-code](#entity-claude-code).


#### synthesis-skill-equals-stage-contract

*type: `synthesis` · sources: cross-day*

The video introduces **skills** as discrete markdown files capturing Goal, Constraints, Assumptions, Sub-goals — see [framework-skill-creation](#framework-skill-creation) and [concept-icm-d1](#concept-icm-d1). The paper never uses the word "skill" but describes the same object under a different name: the **Layer 2 stage contract** — see [concept-stage-contracts](#concept-stage-contracts) and [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).

## The vocabulary map

| Video term | Paper term | What it is |
|---|---|---|
| Skill | Stage contract (L2 `CONTEXT.md`) | The role / prompt for one step |
| Folder of skills | Workspace ([framework-icm-architecture](#framework-icm-architecture)) | The pipeline |
| Goal/Constraints/Assumptions/Sub-goals ([framework-skill-creation](#framework-skill-creation)) | Inputs/processing/outputs (the contract) | The four-part structure |
| Codify dialogue ([action-codify-voice](#action-codify-voice)) | Edit source not output ([concept-edit-source-principle](#concept-edit-source-principle)) | The maintenance principle |

## Why the rename matters

[entity-semantic-kernel](#entity-semantic-kernel) also uses the word "skill" (Microsoft's term for a callable LLM function). Calling ICM stages "skills" muddies the distinction — Semantic Kernel skills are code-invoked; ICM stage contracts are folder-traversed. The paper's neutral term **stage contract** disambiguates.

## Practical consequence

When a video-listener asks "how do I write a skill?" — answer in **paper vocabulary**: write a stage's L2 `CONTEXT.md` declaring the inputs it reads from the previous `output/`, the references it relies on, and the output it produces ([concept-stage-contracts](#concept-stage-contracts)).

See [synthesis-five-layer-fills-the-gap](#synthesis-five-layer-fills-the-gap), [synthesis-dialogue-to-context-engineering](#synthesis-dialogue-to-context-engineering).


#### synthesis-three-levels-meets-stage-pipeline

*type: `synthesis` · sources: cross-day*

The video proposes a maturity model — [concept-three-levels-ai](#concept-three-levels-ai):

- **L1** Copy & paste
- **L2** Structured prompts (brand-tone files, prompt libraries)
- **L3** Integrated workflows (automated pipelines)

The paper proposes a workspace architecture — [framework-icm-architecture](#framework-icm-architecture) with stages, [concept-stage-contracts](#concept-stage-contracts), [concept-five-layer-hierarchy](#concept-five-layer-hierarchy).

## How they nest

The two are **complementary, not redundant**:

- L1 (ad-hoc) — no ICM at all. Just chat.
- L2 (structured prompts) — populate **Layer 3 references** (voice-and-tone, style guides — [action-codify-voice](#action-codify-voice)). A team at L2 has a `references/` folder full of stable rules but is not yet running a multi-stage pipeline.
- L3 (integrated workflow) — full ICM pipeline. Numbered stage folders ([action-numbered-stage-folders](#action-numbered-stage-folders)), review gates ([action-review-gates](#action-review-gates)), L4 outputs feeding L4 inputs.

## The ROI claim revisited

[claim-l2-roi](#claim-l2-roi) / [quote-l2-roi](#quote-l2-roi) says the L1 → L2 jump has the highest ROI. In paper terms, **building the L3 references/ layer pays the largest dividend** before you build any pipeline. This is consistent with the paper's [concept-edit-source-principle](#concept-edit-source-principle): stable rules are the source; pipelines are runs of the source.

## On-ramp

The coherent adoption path across both sources:

1. [action-codify-voice](#action-codify-voice) — write `voice-and-tone.md` (an L3 reference).
2. [action-move-to-l2](#action-move-to-l2) — build the prompt library (an L3 reference collection).
3. [action-implement-folders](#action-implement-folders) — restructure into an ICM workspace.
4. [action-numbered-stage-folders](#action-numbered-stage-folders) — add staged pipeline.
5. [action-review-gates](#action-review-gates) — formalize the review boundaries.

See [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude), [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract).


#### synthesis-workspace-builder-is-the-meta-dialogue

*type: `synthesis` · sources: cross-day*

The video's [framework-skill-creation](#framework-skill-creation) is a **manual** five-step process: humans extract Goal/Constraints/Assumptions/Sub-goals from chat logs and write the markdown. [entity-k-kumar](#entity-k-kumar) built a **visualization tool** to surface the latent structure.

The paper's [framework-workspace-builder](#framework-workspace-builder) is the **automated** version: a five-stage ICM workspace whose output is a new ICM workspace. Stages: Discovery → Stage mapping → Scaffolding → Questionnaire design → Validation.

## The progression

1. Manual dialogue extraction by humans ([framework-skill-creation](#framework-skill-creation)) — video era.
2. Visualization aid by K. Kumar — research prototype era.
3. Workspace-builder — fully self-hosting, ICM building ICM.

This is the **bootstrap moment**: the methodology produces the tool that produces instances of the methodology. It explains how external adopters ([entity-external-adopters](#entity-external-adopters), [claim-external-adoption](#claim-external-adoption)) — at Edinburgh, ICR Research, Bonn — can build domain workspaces without internalizing every ICM convention.

## Why it matters

The workspace-builder is the **adoption mechanism**. Without it, ICM scales as fast as Van Clief can consult. With it, ICM scales as fast as people can run a five-stage discovery process.

See [synthesis-skill-equals-stage-contract](#synthesis-skill-equals-stage-contract), [synthesis-three-levels-meets-stage-pipeline](#synthesis-three-levels-meets-stage-pipeline).


#### tension-absurdities-vs-bounded-scope

*type: `synthesis` · sources: cross-day*

The sharpest internal contradiction in the corpus is rhetorical, not technical.

## The video's frame

In [quote-absurdities](#quote-absurdities), Jake Van Clief calls multi-agent frameworks ([entity-langchain](#entity-langchain), [entity-semantic-kernel](#entity-semantic-kernel)) "absurdities." The contrarian note [contrarian-frameworks](#contrarian-frameworks) runs with this framing: practitioners are "building folders and markdown files… and getting huge results."

## The paper's frame

The paper ([entity-icm-paper](#entity-icm-paper)) makes the **bounded** version of the same claim ([contrarian-frameworks-overkill](#contrarian-frameworks-overkill)): multi-agent frameworks are overhead **for sequential, human-reviewed workflows** — and the paper explicitly states ICM is **not for** real-time multi-agent collaboration or high-concurrency systems. Use [entity-autogen](#entity-autogen), [entity-langchain](#entity-langchain), or [entity-semantic-kernel](#entity-semantic-kernel) for those.

## Resolution

The two are not contradictions; they are **register differences**. The talk is a provocation aimed at over-tooled practitioners; the paper is a bounded architectural claim aimed at reviewers. The defensible position is the paper's: "for *this class* of problem, frameworks are overhead" — not "frameworks are absurd."

When citing the [quote-absurdities](#quote-absurdities) line to a critical reader, **always pair it with [contrarian-frameworks-overkill](#contrarian-frameworks-overkill)** to mark the scope.

See [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude), [recurring-foil-frameworks](#recurring-foil-frameworks).


#### tension-voice-future-vs-paper-non-support

*type: `synthesis` · sources: cross-day*

The most exciting moment in the video is the finale — see [concept-voice-collaboration](#concept-voice-collaboration), [claim-voice-future](#claim-voice-future), [quote-voice-control](#quote-voice-control): real-time voice-driven AI collaboration during a live call.

The paper's most honest limitation is **its explicit non-support** of exactly that use case. ICM ([concept-icm-d2](#concept-icm-d2)) is for **sequential, human-reviewed** workflows. Real-time multi-agent collaboration and high-concurrency systems are explicitly out of scope; the paper recommends [entity-autogen](#entity-autogen), [entity-langchain](#entity-langchain), or [entity-semantic-kernel](#entity-semantic-kernel) for those.

## How to hold both honestly

1. **The voice demo is real but lives outside ICM as the paper defines it.** It uses an ICM-shaped folder substrate, but the loop (voice → STT → Claude → file I/O during a live call) is exactly the **real-time multi-agent class** the paper excludes.
2. **Security ([question-voice-security](#question-voice-security)) is unresolved.** The paper doesn't engage it because it doesn't claim it.
3. **Forward extension, not contradiction.** The video sketches a future where ICM's folder substrate becomes the shared workspace for live voice collaboration. The paper would call this a separate research program.

## A reconciliation

The voice finale is best read as **"ICM workspaces become the shared state for live collaboration"** — folder-as-substrate persists, but the orchestration moves from human-paced review gates ([action-review-gates](#action-review-gates)) to live voice commands. The paper would require that program be benchmarked separately before being claimed.

See [arc-talk-vs-paper-altitude](#arc-talk-vs-paper-altitude), [open-arc-what-remains](#open-arc-what-remains).


---

### Folder: exhibits

#### exhibit-icm-paper-figures

*type: `exhibit` · sources: paper*

> **Source & provenance.** All exhibits below are extracted from the companion paper [entity-icm-paper-arxiv](#entity-icm-paper-arxiv) (*Interpretable Context Methodology: Folder Structure as Agent Architecture*, Van Clief & McDermott, [arXiv:2603.16021v2](https://arxiv.org/html/2603.16021v2)). Figures were rendered from the paper's inline SVG and captured as PNG; tables were transcribed from the paper's HTML. They were **not** in the YouTube primary source — the talk shows none of this structure on screen. This is the single richest layer of detail the companion source adds. Each exhibit below pairs the rendered image with synthesized insight for a downstream agent.

---

## Figure 1 — The Five-Layer Context Hierarchy

![Five-layer context hierarchy](fig1-five-layer-hierarchy.png)

The load-bearing diagram of the whole paper. Two things the prose and the talk both omit, visible only here:

**(a) Each layer carries an explicit token budget** — the hierarchy is a *budget*, not just a taxonomy:

| Layer | File | Token budget | Diagnostic question | Class |
|-------|------|--------------|---------------------|-------|
| **0** | `CLAUDE.md` | ~800 tok | *"Where am I?"* (global identity) | Structural (routing) |
| **1** | `CONTEXT.md` | ~300 tok | *"Where do I go?"* (workspace routing) | Structural (routing) |
| **2** | Stage `CONTEXT.md` | 200–500 tok | *"What do I do?"* (stage contract) | Structural (routing) |
| **3** | Reference material | 500–2k tok | *"What rules apply?"* (the **factory**, stable across runs) | Content |
| **4** | Working artifacts | varies | *"What am I working with?"* (the **product**, per-run) | Content |

**(b) The colour split is the architecture.** Layers 0–2 (blue, *structural / routing*) total only ~1.3–1.6k tokens — they tell the agent **where it is and what role to play**. Layers 3–4 (orange, *content*) carry the actual substance. The factory/product metaphor (L3 = factory/recipe, L4 = product/ingredients) is the paper's mnemonic for *what should change between runs* (only L4) vs. *what stays fixed* (L3). See [concept-icm-d2](#concept-icm-d2).

> **Agent takeaway:** total structural overhead is ~1.5k tokens. Everything else in a well-scoped stage is task content. That is *why* a stage lands at 2–8k tokens instead of 40k.

---

## Figure 2 — ICM Workspace Folder Structure (layer-annotated)

![Folder structure of an ICM workspace](fig2-folder-structure.png)

The canonical on-disk layout, every node tagged by layer:

```
workspace/
├── CLAUDE.md                 ← Layer 0  (global identity)
├── CONTEXT.md                ← Layer 1  (workspace routing)
├── stages/
│   ├── 01_research/
│   │   ├── CONTEXT.md        ← Layer 2  (stage contract)
│   │   ├── references/       ← Layer 3  (reference, persists)
│   │   └── output/           ← Layer 4  (working, per-run)
│   ├── 02_script/            … same triad (L2 / L3 / L4)
│   └── 03_production/        … same triad (L2 / L3 / L4)
├── _config/                  ← Layer 3  (shared reference)
├── shared/                   ← Layer 3  (shared reference)
└── setup/
    └── questionnaire.md      ← (setup-time only; unannotated)
```

**Synthesized insight (not stated explicitly in prose):**
- Every stage folder is the *same triad* — `CONTEXT.md` (L2) + `references/` (L3) + `output/` (L4). The repeating triad is what makes "add/remove a stage" a filesystem op (see Table 1).
- `_config/` and `shared/` are **top-level Layer 3** — cross-stage reference that escapes the per-stage `references/`. This is how ICM shares stable material (voice, design system, conventions) without duplicating it into every stage.
- `setup/questionnaire.md` is *un-layered* — it runs once at workspace creation and is not part of any run's context. This is the seam where the **non-coder onboarding** happens (the three non-coders in the study filled a questionnaire, not code).

---

## Figure 3 — Context Window Composition by Stage (the efficiency claim, visualized)

![Context window composition by stage](fig3-token-composition.png)

Representative token counts from the paper's *script-to-animation* workspace. Stacked by source: **blue** = Layers 0–2 (structural), **orange** = Layer 3 (reference), **tan** = Layer 4 (working), **grey** = unused / irrelevant context.

| Stage | Total tokens | Composition |
|-------|-------------|-------------|
| Research | **~4.9k** | almost entirely useful (structural + reference + working) |
| Script | **~5.5k** | almost entirely useful |
| Production | **~5.6k** | almost entirely useful |
| **Monolithic** | **~42k** | **mostly grey — the irrelevant band dwarfs the useful content** |

**The visual is the argument.** In the three ICM bars there is almost no grey: nearly every token in context is relevant to the current stage. In the monolithic bar, the grey *"unused/irrelevant"* band is larger than the entire useful payload of any single stage — the agent is carrying all three stages' instructions, all reference material, and all prior outputs simultaneously, ~80%+ of it irrelevant to whatever it is doing right now. This is the concrete mechanism behind [claim-icm-superiority](#claim-icm-superiority) and Liu et al.'s *"lost in the middle"*: ICM doesn't just use fewer tokens, it keeps the **relevant** tokens out of the degraded middle band.

> **Caveat (per the paper):** these are *representative* counts from one workspace, not a measured benchmark across many. The shape is illustrative.

---

## Figure 4 — Pipeline Flow with Human Review Gates

![Pipeline flow through three stages with review gates](fig4-pipeline-review-gates.png)

`Stage 1 (Research) → [Review gate / Human] → Stage 2 (Script) → [Review gate / Human] → Stage 3 (Production)`. Each stage receives its own context (Layers 0–4), writes to its `output/` folder; a **human review gate** (red diamond) sits on every stage boundary where the output becomes editable before the next stage reads it.

**The single most important sentence in the figure:** *"The same model executes every stage; the folder structure controls what context it receives."* This is the thesis in one line — there is **no second agent, no router model, no orchestration code**. The *only* thing that differs between stages is which files the one agent reads. The "multi-agent" behaviour is an illusion produced entirely by folder scoping + human gates. Connects to [concept-dialogue-structure](#concept-dialogue-structure) (each stage contract is a persisted decision tree) and the talk's [contrarian-frameworks](#contrarian-frameworks) stance.

---

## Figure 5 — U-Shaped Human Intervention (N=33 practitioners)

![U-shaped frequency of human edits per stage](fig5-ushaped-edits.png)

Y-axis is ordinal (Never → Rarely → Sometimes → Often → Almost always). Self-reported edit frequency at each stage boundary:

| Stage boundary | Edit frequency | Ordinal band | Character of the edit |
|----------------|----------------|--------------|------------------------|
| **Stage 1 output (Research)** | **92%** | Almost always | **Creative judgment** — direction-setting |
| **Stage 2 output (Script)** | **30%** | Rarely | constrained execution, little to fix |
| **Stage 3 output (Production)** | **78%** | Often | **closer to debugging** — aligning output with earlier decisions |

The **U-shape** is the headline empirical pattern: humans intervene heavily where they set direction (stage 1) and where they reconcile final output against intent (stage 3), but largely leave the constrained middle alone. The paper is careful: *"Values are approximate and based on practitioner self-report through conversation, not instrumented measurement"* (N=33, invite-only community). Use as a **directional** finding, not a metric. Extends [question-icm-scaling](#question-icm-scaling)'s methodology caveats.

---

## Table 1 — Control-Surface Comparison: Framework vs. ICM

The paper's most honest exhibit — first six rows favour ICM, **last four rows favour frameworks** (the "what ICM gives up" section). Reproduced verbatim:

| Dimension | Framework approach | ICM approach |
|-----------|--------------------|--------------|
| Change stage order | Edit orchestration code, redeploy | Rename or reorder folders |
| Modify a prompt | Edit agent configuration in code | Edit a markdown file |
| Add or remove a stage | Write new agent class, update orchestrator | Add or delete a folder |
| Inspect intermediate state | Add logging, build dashboard | Open the folder, read the files |
| Hand off to another person | Document environment, dependencies, setup | Copy the folder |
| Who can make changes | Developer | Anyone with a text editor |
| **Error recovery mid-pipeline** | Built-in retry, fallback, exception handling | Manual re-run of failed stage |
| **Conditional branching** | Programmatic routing based on agent output | Human decides between stages |
| **Concurrent execution** | Native parallel agent coordination | Sequential by design |
| **External service integration** | Programmatic API calls, auth management | Local scripts or MCP connections |

**Synthesis:** rows 1–6 are ICM's pitch (everything is a filesystem/text-editor op, *anyone* can change it, handoff = copy the folder). Rows 7–10 are the **concession lines** — frameworks win on automated error recovery, programmatic branching, true concurrency, and managed integrations. This table is the precise boundary of where the talk's *"multi-agent harnesses are absurdities"* over-reaches: ICM trades those four capabilities away **on purpose**, in exchange for interpretability and zero orchestration code. It is the right trade *only* for sequential, human-reviewed workflows. Directly grounds the counter-perspective in [claim-icm-superiority](#claim-icm-superiority) and [contrarian-frameworks](#contrarian-frameworks).

---

## Table 2 — Layer 3 (Reference) vs. Layer 4 (Working)

| | Layer 3: Reference | Layer 4: Working |
|---|---|---|
| Changes between runs | **No** | **Yes** |
| Example files | `voice.md`, `design-system.md`, `conventions.md` | `research-output.md`, `script-draft.md` |
| Model should | **Internalize as constraints** | **Process as input** |
| Configured during | Workspace setup (once) | Pipeline execution (each run) |
| Folder location | `references/`, `_config/`, `shared/` | `output/` |
| Analogy | **The recipe** | **The ingredients** |

**Why this matters for an agent consuming this vault:** the L3/L4 distinction tells the agent *how to treat each file it reads*. L3 content (`voice.md`, conventions) is a **constraint to obey**; L4 content (prior `output/`) is **material to transform**. Misclassifying the two is the core failure mode the layering prevents — e.g., treating a style guide as editable working text, or treating a prior draft as an immutable rule. The recipe/ingredients metaphor is the paper's compression of this rule.

---

## Cross-References

- Paper entity: [entity-icm-paper-arxiv](#entity-icm-paper-arxiv)
- Core concept: [concept-icm-d2](#concept-icm-d2) · Stage contracts as persisted dialogue: [concept-dialogue-structure](#concept-dialogue-structure)
- Efficiency claim these figures ground: [claim-icm-superiority](#claim-icm-superiority)
- Limitations these figures inherit: [question-icm-scaling](#question-icm-scaling)
- Authors: [entity-jake-van-clief](#entity-jake-van-clief) · [entity-david-mcdermott](#entity-david-mcdermott)


## Related across days
- [entity-icm-paper](#entity-icm-paper)
- [entity-icm-paper-arxiv](#entity-icm-paper-arxiv)
- [concept-five-layer-hierarchy](#concept-five-layer-hierarchy)
- [framework-icm-architecture](#framework-icm-architecture)


---
