---
id: "entity-terminal-bench-2-0"
type: "entity"
entityType: "other"
canonicalName: "Terminal Bench 2.0"
aliases: ["Terminal-Bench 2.0"]
source_timestamps: ["§ The Coupling of Model Training and Harness Design"]
tags: ["benchmark"]
related: ["entity-opus-4-6", "contrarian-harness-optimization", "entity-claude-code"]
---
# Terminal Bench 2.0

## Profile

**Terminal Bench 2.0** is a leaderboard cited in the article as an evaluation suite for terminal-based coding agents.

## Role in This Source

The author uses Terminal Bench 2.0 as the evidentiary basis for [[contrarian-harness-optimization]]: [[entity-opus-4-6|Opus 4.6]] scores **significantly differently** on this benchmark depending on the harness used. The cited delta — moving from roughly Top 30 to Top 5 by swapping harnesses while holding the model fixed — illustrates that **harness optimization yields massive performance gains**.

## Verification Note

A public, indexed Terminal Bench 2.0 leaderboard with the exact rankings described is not easily located. The benchmark may be community-maintained or referenced by a colloquial name for a known terminal-based coding eval. The directional finding (harness choice swings benchmark rankings dramatically for fixed models) is consistent with SWE-bench, AgentBench, and similar evals.
