---
id: "entity-icm-paper-arxiv"
type: "entity"
entityType: "publication"
canonicalName: "Interpretable Context Methodology: Folder Structure as Agent Architecture"
aliases: ["ICM Paper", "arXiv:2603.16021", "Van Clief & McDermott 2026"]
source_url: "https://arxiv.org/html/2603.16021v2"
source_timestamps: []
tags: ["primary-source", "peer-companion", "paper", "arxiv", "supporting-source"]
related: ["concept-icm", "entity-jake-van-clief", "entity-david-mcdermott", "claim-icm-superiority", "concept-three-levels-ai", "question-icm-scaling"]
sources: ["video"]
sourceVaultSlug: "interpretible-context-methodology-icm-2026Jun02"
originDay: 1
---
# ICM Paper — *Interpretable Context Methodology: Folder Structure as Agent Architecture*

> **Provenance note:** This note is a **supplementary companion source** added alongside the YouTube extraction. The video ([[entity-jake-van-clief]]'s talk) is the *primary* source of this vault; this is the formal academic paper by the **same author** that grounds the video's claims. The `yt-extract-agent` pipeline is single-source — this note was folded in manually so downstream agents inherit both the practitioner talk and its peer-companion paper.

## Bibliographic

- **Title:** Interpretable Context Methodology: Folder Structure as Agent Architecture
- **Authors:** [[entity-jake-van-clief]], [[entity-david-mcdermott]]
- **arXiv:** [2603.16021v2](https://arxiv.org/html/2603.16021v2) (18 Mar 2026)
- **Affiliation:** Eduba, University of Edinburgh

## Abstract (verbatim)

> Current approaches to AI agent orchestration typically involve building multi-agent frameworks that manage context passing, memory, error handling, and step coordination through code. These frameworks work well for complex, concurrent systems. But for sequential workflows where a human reviews output at each step, they introduce engineering overhead that the problem does not require. This paper presents Interpretable Context Methodology (ICM), a method that replaces framework-level orchestration with filesystem structure. Numbered folders represent stages. Plain markdown files carry the prompts and context that tell a single AI agent what role to play at each step. Local scripts handle the mechanical work that does not need AI at all. The result is a system where one agent, reading the right files at the right moment, does the work that would otherwise require a multi-agent framework.

## Visual Exhibits

The paper's 5 figures + 2 tables are extracted, rendered, and synthesized in **[[exhibit-icm-paper-figures]]** — including the five-layer hierarchy with per-layer token budgets (Fig 1), the layer-annotated folder tree (Fig 2), the stacked token-composition chart showing the monolithic ~42k context as mostly irrelevant waste (Fig 3), the human-review-gate pipeline (Fig 4), the U-shaped intervention chart (Fig 5), and the framework-vs-ICM control-surface table (Table 1). These exhibits are the richest layer the companion source adds over the video.

## Formal Components (grounds the video's [[concept-icm-d1]])

**Five-Layer Context Hierarchy** — the paper's central artifact, not stated explicitly in the talk:

- **Layer 0** — `CLAUDE.md` (global identity)
- **Layer 1** — `CONTEXT.md` (workspace routing)
- **Layer 2** — Stage `CONTEXT.md` (stage contracts)
- **Layer 3** — Reference material (stable across runs)
- **Layer 4** — Working artifacts (per-run content)

**Stage structure** — numbered folders (`01_research`, `02_script`, `03_production`) with explicit Inputs / Process / Outputs contracts. **Review gates** sit between stages as human intervention points where outputs become editable. The workspace is a self-contained folder using plain markdown + JSON as the universal interface. See [[concept-dialogue-structure]] and [[framework-skill-creation]].

## Quantitative Grounding (sharpens [[claim-icm-superiority]])

The video's "20–40% token reduction" is anecdotal; the paper supplies the underlying figures:

- **Per-stage context:** 2,000–8,000 *focused* tokens per stage vs. a monolithic prompt **exceeding 40,000 tokens, most of it irrelevant**.
- **Theoretical basis:** Liu et al.'s *"lost in the middle"* context-degradation effect — staged loading keeps relevant content out of the degraded middle band.
- **Practitioner observation (N=33, informal self-report):** **30 of 33** report a **U-shaped intervention pattern** — heavy editing at stage 1 (**92%**), light at stage 2 (**30%**), heavy at stage 3 (**78%**). Three non-coders successfully built video workspaces.
- ⚠️ **No controlled quantitative comparison** between ICM and monolithic prompting is reported. The numbers are efficiency/usage figures, not a benchmark win.

## Intellectual Lineage

The paper situates ICM against: McIlroy's Unix "do one thing well" + plain-text-as-interface; Shaw & Garlan's pipe-and-filter pattern; Aho et al.'s multi-pass compilation / intermediate representation; Wei et al.'s chain-of-thought decomposition; Horvitz's mixed-initiative systems; Liu et al.'s lost-in-the-middle; Fails & Olsen's interactive ML; Knuth's literate programming; Rudin's interpretability framework; and Karpathy's "context engineering" (2025) — the same lineage [[entity-andrej-karpathy-d1]] is cited for in the talk.

## Stated Limitations (extends [[question-icm-scaling]])

- Data is **self-reported through conversation, not instrumented** measurement.
- Practitioner community is **invite-only, self-selected** (52 members); active use **concentrated in content production**.
- **All testing on a single model family** (Claude Opus/Sonnet 4.6).
- **No controlled comparison** of staged vs. monolithic loading.
- **Non-support (explicit):** cannot handle real-time multi-agent collaboration, high-concurrency systems, or complex automated branching — consistent with the talk's [[contrarian-frameworks]] caveat that frameworks retain value across security boundaries and at scale.

## Open Questions Raised

- Does the five-layer hierarchy **generalize across model families**?
- As context windows grow, does **selective loading stay important**?
- How **sensitive** is output quality to context ordering/formatting within layers?
- Needs: formal cross-model evaluation + structured user studies with systematic data collection.


## Related across days
- [[entity-icm-paper]]
- [[exhibit-icm-paper-figures]]
- [[arc-talk-vs-paper-altitude]]
- [[arc-evidence-base-evolution]]