---
id: "concept-evaluation-quality-judgment"
type: "concept"
source_timestamps: ["00:06:56", "00:07:28"]
tags: ["quality-assurance", "testing", "skill-2"]
related: ["concept-confidently-wrong", "concept-edge-case-detection", "action-build-eval-harnesses", "framework-7-ai-skills", "contrarian-taste-is-error-detection", "entity-upwork", "entity-anthropic"]
definition: "The systematic, measurable process of testing AI outputs against precise specifications to determine functional correctness and detect errors."
sources: ["s42-job-market-split"]
sourceVaultSlug: "s42-job-market-split"
originDay: 42
---
# Evaluation and Quality Judgment

## Skill #2 of [[framework-7-ai-skills]]

**Evaluation** is the most frequently cited skill in AI job postings — particularly visible on [[entity-upwork]] listings that explicitly demand evaluation harnesses and functional tests.

It is the systematic process of determining if an AI system actually achieved the specified intent. While often vaguely discussed as having 'taste' in AI, [[contrarian-taste-is-error-detection]] reframes this: taste is just error detection at fluent speed.

## What employers want

- Automated evaluations and simulation runs.
- Evaluation harnesses for functional tasks.
- Longitudinal metric tracking.
- Edge-case suites — see [[concept-edge-case-detection]].

## The bar for a 'good eval'

A robust evaluation task is one where multiple engineers can look at the output and reach the **exact same pass/fail conclusion**. [[entity-anthropic-d42]]'s engineering blog is cited as a canonical reference here.

## Psychological prerequisite

Evaluation requires resisting the temptation to view an AI's fluent, confident output as inherently correct — see [[concept-confidently-wrong]] and [[claim-fluency-not-competence]].

## Action

Follow [[action-build-eval-harnesses]] to convert this skill into a portfolio artifact.


## Related across days
- [[concept-scenario-testing]]
- [[concept-single-eval-gate]]
- [[concept-quantitative-skill-testing]]
- [[framework-agentic-eval-loop]]
- [[concept-comprehension-gate]]
