---
id: "question-controlled-comparison"
type: "open-question"
source_timestamps: ["§4.6"]
tags: ["limitation", "evaluation"]
related: ["claim-token-efficiency", "claim-ushaped-intervention"]
sources: ["paper"]
sourceVaultSlug: "icm-paper-folder-architecture-2026Jun02"
originDay: 2
---
# Is staged loading actually better than monolithic prompting?

## The question

Is ICM's staged context loading **measurably better** than monolithic prompting on the same tasks?

## Current state

**No controlled comparison has been conducted.** The quality claim rests on:

- "Lost in the middle" theory ([[prereq-llm-context-windows]]),
- self-reported practitioner experience from an **invite-only, self-selected community** of 33 ([[claim-ushaped-intervention]]),
- representative — not measured — token counts ([[claim-token-efficiency]]).

## Resolution path

A controlled comparison of ICM staged context loading vs monolithic prompting on the **same tasks**, with **instrumented measurement**:

- output quality (human-rated and automated metrics),
- editing burden,
- error rate,
- time-to-completion.

## Why it matters

Moves ICM's efficiency and quality claims from **directional to demonstrated**. The Stanford HAI guidance on validating AI claims explicitly flags this kind of baseline comparison as essential — and the paper, to its credit, acknowledges the gap rather than hiding it.


## Related across days
- [[claim-token-efficiency]]
- [[claim-icm-superiority]]
- [[question-icm-scaling]]
- [[arc-evidence-base-evolution]]
- [[open-arc-what-remains]]