---
id: "action-evaluate-iteration"
type: "action-item"
source_timestamps: ["00:03:35", "00:04:00"]
tags: ["tool-evaluation", "ux"]
related: ["concept-agent-iteration-speed", "contrarian-agent-babysitting"]
action: "Test AI agents based on how quickly users can review, correct, and approve their actions, rather than expecting zero-shot perfection."
outcome: "Selects tools that actually improve daily workflow efficiency rather than creating frustrating bottlenecks of incorrect autonomous actions."
sources: ["s51-512k-leaked-code"]
sourceVaultSlug: "s51-512k-leaked-code"
originDay: 51
---
# Evaluate Agents on Iteration Speed, Not Just Accuracy

## Action

Test AI agents based on **how quickly users can review, correct, and approve** their actions, rather than expecting zero-shot perfection.

## Outcome

Selects tools that actually improve daily workflow efficiency rather than creating frustrating bottlenecks of incorrect autonomous actions.

## Why

See [[concept-agent-iteration-speed]] and [[contrarian-agent-babysitting]] — flashy demos showcase zero-shot accuracy, but real-world utility is *iteration cycle speed*. McKinsey reports 60% of enterprises abandon agents because babysitting overhead exceeds value.

## Suggested Evaluation Protocol

1. Run agent on **representative real workflow tasks** (not synthetic benchmarks).
2. Measure:
   - Time from agent proposal → human review → corrected action.
   - Number of iterations needed to reach acceptable output.
   - Net time saved vs. doing the task manually.
3. Reject tools where the iteration cycle is slow, regardless of headline benchmark scores.

## Anti-Pattern

Procuring an agent based solely on a vendor demo where it executes a complex task in one shot.
