Notes: Light is the Future (Pt.1)

Notes: Light is the Future (Pt.1)

Summary

  • DeepSeek, Huawei, and then AVGO in 2025 pulled forward the timeline for optical adoption, flipping the “copper-first” consensus and reshaping cluster design priorities.
  • DeepSeek’s P/D MoE inference shows scale-out fabrics can handle high-performance inference across hundreds of GPUs, reducing the marginal value of ever-larger NVLink scale-up nodes.
  • Huawei’s optics-first CM384 “superpod” pushes scale-up-like behavior across racks, narrowing the gap with NVDA despite weaker chips by improving MFU, linearity, and power density.
  • NVDA’s current strategy remains copper-heavy scale-up plus rack-density engineering (chiplets, trays, liquid cooling, larger NVLink domains), but this increases blast radius, complexity, power density, and lock-in concerns for hyperscalers.
  • The optical supply chain is entering a capacity-constrained upcycle into 2026, with 800G/1.6T demand surging, EML bottlenecks supporting pricing, and beneficiaries spanning modules, DSPs, lasers, silicon photonics, OCS, and switch silicon.

2025 is the year light overturns the copper-first consensus.

In the first half of 2025, DeepSeek and Huawei catalyzed a paradigm shift that pulled forward the timeline for optical adoption. We covered DeepSeek’s inference cluster and Huawei’s CM384 cluster design:

In the second half of 2025, Broadcom (AVGO) won major AI ASIC designs from hyperscalers and expanded Google’s TPU production plan, further supercharging the shift from copper to optics.

Cluster-based inference undercuts demand for ever-larger NVLink nodes

DeepSeek open-sourced its P/D inference cluster architecture, which has accelerated adoption of cluster-based, multi-GPU inference. Compared with single-GPU or single-node inference, this approach improves token cost, stretches context-window limits, and lifts performance.

Historically, both training and inference leaned on expensive high-speed interconnects. Inside a server, multiple GPUs are tied together over scale-up NVLink (copper), while servers are linked in scaled-out fashion across racks via slower, higher-latency Ethernet or InfiniBand using optical modules. Because the scale-out fabric was the bottleneck, inference typically stayed within a single chip or a single NVLink node. NVIDIA’s proprietary scale-up domain made 8 Hopper chips (H100 NVL8) or 72 Blackwell chips (GB200 NVL72) behave like one device — but practitioners rarely crossed beyond one node given the penalties of the scale-out network: lower speed, higher latency, higher error rates, and more points of failure (albeit at lower cost and with less vendor lock-in).

DeepSeek’s P/D design — also used in similar forms by top proprietary labs — shows that you can run MoE inference effectively over the scale-out network. Built around export-compliant NVDA H800s (a China-market H100 variant), experts are dynamically load-balanced across hundreds of GPUs (e.g., 320), delivering far better cost and performance than loading a single DeepSeek V3 onto an 8x H100 NVLink node. Crucially, this makes the H800’s half-bandwidth NVLink disadvantage largely moot, because heavy NVLink spend is no longer the gating factor.

Previously, model size for inference was capped by the memory of a single NVLink domain; spanning the scale-out network wasn’t viable. That constraint let NVIDIA justify ever-larger NVLink domains with relatively smaller scale-out fabrics — tightening vendor lock-in as NVLink spend rose versus open Ethernet. DeepSeek’s approach reverses that logic.

Huawei’s all-optical superpods blur the scale-up/scale-out boundary

Huawei’s all-optical method narrows the gap with NVIDIA despite weaker individual chips. Unlike NVIDIA’s copper-limited scale-up domains (NVL8 and NVL72 today, NVL144 for the next generation), Huawei uses light to build a “giant node” that effectively fuses scale-up performance with scale-out performance. Per our prior coverage, Huawei put a 384‑chip CM384 superpod into production in May 2025 and has roadmapped an 8,192‑chip superpod for 4Q26 and a 15,488‑chip system for 4Q27. These ambitions dwarf NVIDIA’s 144‑chip (576‑die) per‑rack plans built around quad‑die packaging.

Contact Footer Example