---
id: "claim-inference-power"
type: "claim"
source_timestamps: ["00:03:27"]
tags: ["hardware", "energy", "nvidia"]
related: ["entity-billy-dally"]
confidence: "high"
testable: true
validation_status: "supported"
sources: ["s20-50x-faster"]
sourceVaultSlug: "s20-50x-faster"
originDay: 20
---
# Inference Is 90% of Data Center Power

## Claim

According to Nvidia's [[entity-billy-dally]], inference (not training) now accounts for 90% of data center power consumption, heading toward 10,000 to 20,000 tokens per second per user.

## Speaker Confidence

High.

## External Validation

**Supported.** Inference dominates data center power (trending >90% per Nvidia executives like Bill Dally in prior public talks), with stated targets of 10k-20k tokens/sec/user. External sources confirm the broader shift from training-dominated to inference-dominated compute.

## Why It Matters

Underscores that the cost of running agents at scale is now an inference problem, not a training problem. This makes [[concept-human-affordance-bottleneck]] economically urgent: every wasted second on a paginated API is paid for in literal megawatts.

## Related

- [[entity-billy-dally]]
- [[concept-agentic-economy-d20]]
- [[concept-human-affordance-bottleneck]]
