AI’s Hidden Tollbooth: Why HBM Memory Is the Next GPU Constraint

HBM memory is sold out through 2026 at all three suppliers - SK Hynix, Micron, Samsung. Prices up 20%. Every AI GPU needs it. No substitute exists. How this feeds into NVDA earnings on May 20 and the Vera Rubin timeline.

High Bandwidth Memory is the single most constrained component in the AI supply chain. Every AI GPU — H100, H200, B200, B300 — requires it. There is no substitute. And every supplier is sold out through 2026.

This is not a repeat of the 2021 chip shortage. It is a structural reallocation of the world's silicon wafer capacity toward AI memory products, with no relief expected before late 2027 at the earliest.

The Numbers

Bank of America estimates the 2026 HBM market at $54.6 billion — a 58% increase year over year. Micron projects the total addressable market reaching approximately $100 billion by 2028, arriving two years ahead of earlier forecasts.

But supply cannot keep pace. Micron CEO Sanjay Mehrotra stated during Q1 FY2026 earnings: Our HBM capacity for calendar 2025 and 2026 is fully booked. SK Hynix confirmed that all DRAM, NAND, and HBM production through 2026 is essentially sold out. Micron disclosed it can only meet 50% to 66% of demand from core customers.

Three companies control nearly the entire market. Counterpoint Research measured SK Hynix at 62% market share, Micron at 21%, and Samsung at 17%. Both Samsung and SK Hynix raised HBM3E supply prices by nearly 20% for 2026 contracts — described by industry observers as unusual. Samsung is charging approximately $700 per unit for its latest HBM product.

Why This Matters for NVIDIA

Every GPU generation demands exponentially more HBM. The H100 uses 80 GB. The H200 uses 141 GB. The B200 requires 192 GB. The B300 pushes to 288 GB of 12-layer HBM3E. Each gigabyte of HBM consumes roughly 3 to 4 times the wafer capacity of standard DRAM, according to Micron executives and TrendForce analysis.

NVIDIA's supply-related purchase commitments tell the real story. They rose from $50.3 billion at the end of Q3 to $95.2 billion at the end of Q4 — nearly doubling in a single quarter. That is not optional inventory building. That is NVIDIA aggressively locking in component supply for a constraint they know is structural.

NVIDIA has secured over 60% of TSMC's total 2026 CoWoS output, according to Morgan Stanley. Its 2026 CoWoS demand is approximately 700,000 wafers, up 75% from 2025. But CoWoS is only one bottleneck. Even with the packaging capacity, the HBM to fill those chips simply does not exist.

A Moat and a Constraint

This is where Law II — Difficulty Is Load-Bearing — applies in both directions.

The HBM shortage is a moat for NVIDIA because no competitor can get enough memory either. AMD's MI350 offers 288 GB of HBM3E and better TCO on paper, but AMD faces the same HBM supply constraints. Custom ASICs like Google's TPU v7 are growing fast (TrendForce projects custom ASIC shipments up 44.6% in 2026 vs 16.1% for GPUs), but every ASIC also needs HBM.

The constraint compounds. Every wafer allocated to an HBM stack is a wafer denied to other memory products. IDC put it plainly: Every wafer allocated to an HBM stack for an NVIDIA GPU is a wafer denied to the LPDDR5X module of a mid-range smartphone or the SSD of a consumer laptop.

This is why NVIDIA's gaming revenue is collapsing. The RTX 5090 trades at 65% above MSRP. The RTX 5080 sits at 45% above MSRP. NVIDIA plans no new gaming GPU launches in 2026 — a first in approximately 30 years, according to The Information. Gaming's share of NVIDIA revenue dropped from roughly 35% in 2022 to approximately 8% in fiscal 2026. Every HBM wafer goes to data center because that is where the margin is.

The Memory Crisis Is Spreading

Because HBM production consumes wafer capacity that would otherwise produce conventional DRAM, the shortage cascades. TrendForce's Q1 2026 projections show quarter-over-quarter increases: conventional DRAM up 90-95%, PC DRAM up 105-110%, server DRAM up 88-93%. One DRAM type soared 75% from December to January alone. Bernstein analyst Mark Li described memory chip prices as going parabolic.

Intel CEO Lip-Bu Tan stated at the Cisco AI Summit: There's no relief until 2028. Synopsys CEO Sassine Ghazi told CNBC in January 2026: The memory shortage will continue until 2026 and 2027. Most memory produced by major companies is being directly channeled into AI infrastructure.

What to Watch on May 20

The NVDA earnings call will not mention HBM explicitly — they will call it component supply or memory constraints. But the signal is in the purchase commitments. If they rise again from $95.2B, NVIDIA is securing supply deeper into the shortage. If they flatten, either supply is easing (bullish) or they have hit the limit of what memory suppliers can allocate (bearish).

The second signal: any mention of gaming revenue trajectory. If gaming continues to be starved for allocation, HBM pressure remains intense.

The third signal: HBM4 timelines with Vera Rubin. NVIDIA samples shipped to customers the week of February 25. Production is on track for H2 2026. If that slips, the constraint is not just current HBM3E but the next generation as well.

The Bear Case Through This Lens

The HBM shortage is not universally bullish for NVIDIA. It also constrains their ability to grow data center revenue at the rate the market expects. If HBM supply caps GPU shipments, revenue growth is capped regardless of demand.

The falsification trigger: if hyperscaler ASICs (Trainium, TPU) start securing HBM allocations that exceed their share of the compute market, the bottleneck is being reallocated away from NVIDIA. Watch for ASIC-specific HBM supply announcements as a leading indicator.

Published by The Durability Curve — AI infrastructure research. Finding compute bottlenecks before they are priced.

Full NVDA research report (36p, £9) — @durabilitycurve — Substack

Continue reading the series:

How HBM feeds into the May 20 print: Three Signals the Market Is Missing on NVDA.

The electrical infrastructure constraint on the data centers that house these GPUs: Switchgear bottleneck.

The framework explaining why the bottleneck migrated from GPUs to memory: The Five Laws.

Browse all published research →