---
id: "claim-post-training-beats-raw-intelligence"
type: "claim"
source_timestamps: ["00:11:05", "00:12:28"]
tags: ["model-training", "ai-capabilities"]
related: ["concept-vibe-coding", "contrarian-post-training-over-intelligence"]
confidence: "high"
testable: true
speakers: ["Peter Steinberger", "Nate B. Jones"]
sources: ["s16-openclaw-saga"]
sourceVaultSlug: "s16-openclaw-saga"
originDay: 16
---
# Post-Training Trumps Raw Intelligence for Agents

## Claim

The primary bottleneck in creating effective AI agents is **no longer the raw intelligence or parameter count** of the underlying foundation model. The critical differentiator is **post-training** — specifically training models to:

- Execute long-horizon tasks
- Correct their own errors
- Interact reliably with tools and APIs

## Steinberger's Argument

[[entity-peter-steinberger-d16]] argues that models optimized for **'correct code over long runs'** (like OpenAI's Codex) are more valuable for agentic workflows than models that simply chat well — even if the chat models score higher on traditional intelligence benchmarks. He publicly advocated for Codex over Claude on [[entity-lex-fridman]]'s podcast.

## Contrarian Framing

See [[contrarian-post-training-over-intelligence]] for the explicit contrarian framing.

## Connection to Vibe Coding

This claim directly enables [[concept-vibe-coding-d16]] — only post-trained models reliably support multi-thousand-commit agentic engineering.

## Confidence: High (per source) / Partially supported (per enrichment)

Enrichment review: post-training is emphasized in agent benchmarks like Berkeley Function-Calling Leaderboard. Counter-evidence: OpenAI's o1/o3 reasoning papers show **pre-training compute and inference-time scaling remain critical**. Treat the claim as 'post-training is the marginal differentiator', not 'scale doesn't matter.'