---
id: "entity-hbm"
type: "entity"
entityType: "product"
canonicalName: "High Bandwidth Memory (HBM)"
aliases: ["HBM", "stacked DRAM"]
source_timestamps: ["01:44:00", "14:08:00"]
tags: ["hardware", "memory", "supply-chain"]
related: ["concept-ai-memory-crisis", "claim-memory-bottleneck", "prereq-gpu-memory-hierarchy"]
canonicalUrl: "https://en.wikipedia.org/wiki/High_Bandwidth_Memory"
sources: ["s49-killed-ram-limits"]
sourceVaultSlug: "s49-killed-ram-limits"
originDay: 49
---
# High Bandwidth Memory (HBM)

High Bandwidth Memory (HBM) is a specialized type of stacked DRAM used in advanced GPUs to provide the bandwidth needed for AI inference and training.

**Why it matters in this vault**:
- HBM is **structurally constrained in supply** due to manufacturing difficulties, helium shortages, and elevated power costs at fabs.
- It is the **primary physical bottleneck** for AI scaling — see [[concept-ai-memory-crisis]] and [[claim-memory-bottleneck]].
- Building new HBM fab capacity takes 5+ years.
- HBM prices have surged by hundreds of percent due to the demand-supply mismatch.

HBM scarcity is the immediate motivation for software approaches like [[concept-turboquant]] and architectural responses like [[concept-multi-head-latent-attention]]. It is also the resource [[entity-nvidia-d49]]'s upcoming [[entity-vera-rubin]] architecture promises to scale 500x.

Understanding the GPU memory hierarchy in which HBM sits is captured in [[prereq-gpu-memory-hierarchy]].

**Canonical URL**: https://en.wikipedia.org/wiki/High_Bandwidth_Memory