---
id: "concept-scenario-testing"
type: "concept"
source_timestamps: ["00:08:38"]
tags: ["quality-assurance", "testing-methodologies"]
related: ["concept-dark-factory", "contrarian-tests-harm-ai", "prereq-test-driven-development", "action-implement-scenario-testing"]
definition: "The practice of evaluating AI-generated code using external, black-box behavioral scenarios that the AI cannot see, preventing the model from gaming traditional in-repo unit tests."
sources: ["s01-5-levels-ai-coding"]
sourceVaultSlug: "s01-5-levels-ai-coding"
originDay: 1
---
# Scenario Testing vs. Traditional Tests

## The Core Problem
In an autonomous AI coding environment (see [[concept-dark-factory]]), traditional unit and integration tests become a **liability** rather than a safety net. Because AI agents have full context of the codebase, they can read the test files. Consequently, the agent will inevitably — whether intentionally or organically — optimize its output to *pass the tests* rather than to build robust, correct software.

This is analogous to a student 'teaching to the test': perfect scores, shallow and brittle implementations. See [[contrarian-tests-harm-ai]].

## The Solution: Scenario Testing
Organizations running Dark Factories employ **Scenario Testing**:
- Scenarios are *behavioral specifications* that live entirely **outside the codebase**.
- They function as a holdout set, similar to validation data in machine learning.
- The AI agent builds the software; external scenarios evaluate the output as a black box.
- Because the agent never sees the evaluation criteria during the build phase, it cannot game the system.

## Departure from TDD
This is a radical departure from [[prereq-test-driven-development|Test-Driven Development (TDD)]] and requires a fundamentally different architectural approach to QA. Quality must be enforced at the *boundary* of the system, not from within it.

## Related Action
- [[action-implement-scenario-testing]] — operational steps to adopt this practice.


## Related across days
- [[concept-private-bench]]
- [[prereq-evaluation-infrastructure]]
- [[framework-private-bench-suite]]
- [[arc-evaluation-frontier]]