---
id: "concept-single-eval-gate"
type: "concept"
source_timestamps: ["00:15:38", "00:16:20"]
tags: ["software-architecture", "quality-control", "agentic-workflows"]
related: ["claim-human-handoffs-bottleneck", "contrarian-intermediate-testing-degrades", "action-consolidate-eval-gates"]
definition: "An architectural pattern that replaces multiple intermediate quality checks with one comprehensive, automated evaluation checkpoint at the end of an AI agent's execution."
sources: ["s44-claude-mythos"]
sourceVaultSlug: "s44-claude-mythos"
originDay: 44
---
# Single Eval Gate

## Definition

An architectural pattern that replaces multiple intermediate quality checks with **one comprehensive, automated evaluation checkpoint** at the end of an AI agent's execution.

## Problem it solves

Conventional pipelines insert human or scripted checks at every stage:
- Check the draft
- Check the logic
- Check the formatting
- Check the output

As AI models become capable of writing production-ready code end-to-end, these intermediate human-in-the-loop handoffs become the dominant bottleneck — see [[claim-human-handoffs-bottleneck]] and [[quote-human-bottleneck]].

## How a single eval gate works

1. Agent receives a complex task with explicit success criteria.
2. Agent executes end-to-end **without interruption**.
3. A single, rigorous evaluation gate at the end tests:
   - All functional requirements
   - All non-functional requirements
   - Edge cases
   - Exception handling
   - Policy compliance
4. On failure, the output is returned to the model for iteration.

## Why it works (per the source)

Frontier models are better at *self-correcting during execution* than humans are at *micromanaging from outside*. Removing intermediate friction maximizes agent velocity.

## Counter-perspective

See [[contrarian-intermediate-testing-degrades]]. External evaluators (Google AgentOptimizer 2025, AlphaCode 2 evals) report 20–40% hallucination/error propagation in long autonomous chains. Hybrid human+AI pipelines retain quality advantages on novel domains. The single eval gate works best when:
- The task domain is well-bounded
- The eval suite is genuinely comprehensive
- The model is at the capability level the source assumes

## Action and architecture tie-ins

- [[action-consolidate-eval-gates]] — concrete redesign step
- [[framework-mythos-readiness]] — step 4 of the readiness transformation
- Pairs with [[concept-outcome-driven-prompting]] and [[concept-model-driven-retrieval]]


## Related across days
- [[concept-comprehension-gate]]
- [[concept-multi-level-verification]]
- [[concept-evaluation-quality-judgment]]
- [[concept-scenario-testing]]
