---
type: "synthesis"
primary_sources: ["s01", "s04", "s06", "s12", "s21", "s23", "s26", "s35"]
tags: ["agents", "karpathy-loop", "long-running", "persistence", "autonomy"]
id: "arc-agentic-loop-evolution"
sources: ["cross-day"]
---
# The Agentic Loop Across The Series

Eight episodes contribute to a coherent technical evolution of *what an agentic system is and how it operates*. Read together, they describe a single design pattern at progressively larger scope.

## The Stages

### S01 — The Dark Factory (the destination)

[[concept-dark-factory]]: Level 5 of [[framework-5-levels-vibe-coding]] — specs in, working software out, no human writes or reviews code. [[entity-strongdm]] is the canonical case. Requires [[concept-scenario-testing]] and [[concept-digital-twin-universe]] to be safe.

### S04 — The Karpathy Loop (the mechanism)

[[concept-karpathy-loop]] · [[framework-karpathy-loop-execution]]: 5-step constrained self-improvement (analyze → propose → run → evaluate → commit/revert). [[concept-meta-task-agent-split]] separates harness optimization from domain execution. [[concept-trace-driven-optimization]] feeds rich logs to the meta-agent. [[concept-model-empathy]] (same-model pairing). [[claim-emergent-meta-behaviors]]: agents spontaneously invent spot-checking, formatting validators, progressive disclosure.

### S06 — The Workspace Agent (the deployment shell)

[[concept-workspace-agents]]: cloud-based, schedule-triggered, in-channel. The constraint on what kind of agent succeeds: [[framework-ideal-agent-target]] (Cadence + Systems + Output + Path checks) and [[quote-known-path]] ("if the path is known, it gets really interesting").

### S12 — Agentic Persistence (the capability win)

[[concept-agentic-persistence]] · [[claim-fixes-quitting]]: 4.7's headline improvement is sustaining focus through multi-step pipelines without prematurely quitting. This is what makes the model a *co-worker* rather than a *chatbot*. But [[concept-trust-failure-hallucination]] shows the still-broken edge.

### S21 — Agents With Hands and Feet (the visual+memory layer)

[[framework-fundamental-loop]]: Agent Surfaces → Human Decides → Agent Executes. The agent gains [[concept-agentic-memory]] (no recency decay), [[concept-cross-category-reasoning]], and [[concept-shared-surface]] access via [[concept-agent-door]].

### S23 — The Comprehension Boundary (the safety layer)

[[framework-dark-code-solution]]: Layer 1 spec → Layer 2 context engineering → Layer 3 [[concept-comprehension-gate]]. The agent's output must pass *legibility* review before merge.

### S26 — Can It Carry? (the evaluation reframe)

[[concept-can-it-carry]]: the new evaluation question. [[concept-system-matters]] — judge the model + its tooling stack as one unit. [[framework-private-bench-suite]] (Dingo / Splash Brothers / Artemis) replaces flattening public benchmarks ([[contrarian-public-benchmarks]]).

### S35 — Long-Running Agents and AI Reviewing AI (the prediction)

[[concept-long-running-agents]]: by late 2026, agents run for *days or a week* burning millions of tokens. [[claim-humans-as-bottleneck]] · [[quote-humans-bottleneck]]. [[concept-ai-reviewing-ai]] · [[framework-agentic-eval-loop]] turns the human-review step into AI-review steps. Triage becomes the high-leverage human activity.

## The Synthesis

The Karpathy Loop (S04) is the *atom* of agentic AI. Persistence (S12) is what makes it run uninterrupted. Workspace Agents (S06) is its deployment shell. Open Brain (S21) is its memory. Comprehension Gates (S23) is its safety. Private Bench (S26) is its evaluation. Long-running execution (S35) is its scale ceiling. AI Reviewing AI (S35) is its quality compound. Dark Factory (S01) is what you call the whole assembly when it works end-to-end.

## The Open Questions

1. **Brownfield migration** — [[question-legacy-brownfield-migration]] (S01): all this works for greenfield like StrongDM. How do enterprises migrate legacy monoliths?
2. **Week-long observability** — [[open-question-agent-monitoring]] (S35): if the agent runs for a week, how do you tell if it's still on track?
3. **Subjective domain evaluation** — [[question-evaluating-subjective-domains]] (S04): un-gameable metrics for empathy, brand voice, creative writing.
4. **Backend hygiene** — [[question-backend-hygiene]] (S26): when do agents stop needing humans for enum normalization and canonical merges?

## Connection to Other Arcs

- The [[arc-spec-bottleneck-evolution|spec arc]] provides the *input* to the loop.
- The [[arc-byoc-memory-architecture|memory arc]] provides the *persistence layer*.
- The [[arc-confident-incorrectness|silent-failure arc]] catalogs the failure modes.
- The [[arc-trust-stack-collapse|trust arc]] explains why external verification at every stage matters.