---
id: "action-measure-before-optimizing"
type: "action-item"
source_timestamps: ["00:06:45", "00:07:05"]
tags: ["performance-tuning", "benchmarking"]
related: ["framework-rob-pike-agent-rules"]
speakers: ["Nate B. Jones"]
action: "Establish baseline performance metrics for an agent before attempting to optimize its speed or architecture."
outcome: "Avoids premature optimization and ensures engineering effort is spent on actual bottlenecks."
sources: ["s41-nvidia-open-sourced"]
sourceVaultSlug: "s41-nvidia-open-sourced"
originDay: 41
---
# Measure baseline agent performance before optimizing

## Action

**Establish baseline performance metrics for an agent before attempting to optimize its speed, prompt, or architecture.**

## Why

Direct application of [[entity-rob-pike]]'s Rules 1 and 2 (see [[framework-rob-pike-agent-rules]]):
- You can't tell where a program will spend its time.
- Don't tune for speed until you've measured.

Premature optimization in agentic systems wastes effort and introduces opaque bugs. You may shave 200ms off a hot path that turns out to be irrelevant while ignoring a 5-second I/O wait elsewhere.

## Concrete Steps

1. **Define the task suite** — a fixed set of representative tasks the agent must complete.
2. **Define metrics** — task success rate, time-to-completion, token cost, retry count, human override rate.
3. **Run a baseline** with the simplest possible architecture (single agent, default prompt).
4. **Record everything** — versioned metrics tied to model + prompt + dataset.
5. **Only then** experiment with prompt changes, architecture changes, or model swaps.
6. **Compare against baseline** — keep changes that move the metric, drop changes that don't.

## Expected Outcome

- Engineering effort is spent on actual bottlenecks, not perceived ones.
- Regression detection becomes possible.
- Architectural decisions become defensible with data.

## See Also

- [[framework-rob-pike-agent-rules]] — Rules 1 and 2
- [[action-simplify-agent-architecture]] — what to do once you've measured
