---
id: "action-implement-scenario-testing"
type: "action-item"
source_timestamps: ["00:08:30"]
tags: ["qa", "testing"]
related: ["concept-scenario-testing", "contrarian-tests-harm-ai"]
action: "Build external, black-box behavioral scenarios to evaluate AI-generated code."
outcome: "Prevent AI from gaming tests and ensure robust software architecture."
sources: ["s01-5-levels-ai-coding"]
sourceVaultSlug: "s01-5-levels-ai-coding"
originDay: 1
---
# Implement Scenario Testing

## Directive
Move evaluation criteria **outside the codebase** into black-box behavioral scenarios. Do **not** rely on traditional in-repo unit tests, as AI agents will optimize to game them rather than build correct software. See [[contrarian-tests-harm-ai]].

## Specific Steps
1. Define behavioral specifications at the system boundary — what *outcomes* must hold true for the running software?
2. Store scenarios in a **separate repository** the coding agent cannot access during build.
3. Run scenarios against deployed builds inside a [[concept-digital-twin-universe|Digital Twin Universe]].
4. Treat scenarios as a **holdout set** — analogous to validation data in machine learning.
5. Iterate on scenario coverage independently of the agent's training/build loop.

## Expected Outcome
The AI cannot game evaluation criteria it never sees. Quality is enforced at the boundary; architecture stays sound. See [[concept-scenario-testing]].