Notes: AI Bubble? Spending, Risks, and Returns — A High-Level Assessment
Summary
- Hyperscalers’ AI capex is soaring past historical limits, forcing creative financing and intensifying scrutiny over ROI and long-term sustainability.
- Despite fears of an AI bubble, utilization remains high, demand is outpacing capex, and efficiency breakthroughs from Chinese labs are lowering compute costs while expanding practical use cases.
- The real bubble risk lies not in inference but in frontier training, where escalating $50B–$100B+ model cycles create leveraged exposure with uncertain future returns.
- NVDA and PLTR remain grounded in conservative enterprise demand, while CRWV and ORCL function as higher-beta call options on extreme AI growth with meaningful overbuild and balance-sheet risk.
- Systemic risk is limited for now, but a bubble could emerge if capex continues accelerating faster than demand, especially as inductive LLMs hit limits that constrain the path to ASI.
As we've long anticipated, AI spending is reaching a critical threshold where several dynamics are intensifying:
- Mounting Losses: Hyperscalers can no longer absorb AI expenditures within their standard capex budgets. AI is now propelling these budgets from the $50bn-$100bn range to well over $200bn annually for major players, with the four largest (Amazon, Google, Microsoft, Meta) projected to exceed $350bn collectively in 2025 alone. This surge is straining free cash flow margins and other key financials, heightening investor concerns as companies allocate a record 60% of operating cash flow to capex.
- Creative Financing Tactics: To reduce execution risks and lighten balance sheet loads, hyperscalers are adopting novel financing strategies, such as partnerships, off-balance-sheet vehicles, and increased debt issuance. However, these methods are sparking investor unease amid broader warnings of potential debt bubbles tied to AI spending.
- ASIC Push: Hyperscalers are now aggressively ramping custom ASIC development, prioritizing long-term cost efficiency over NVIDIA’s fast time-to-market GPGPU model. GOOG, MSFT, OpenAI, Anthropic, AAPL, X.ai, TSLA, AMZN, ByteDance, and META are all increasing AI capex, with a growing share directed toward in-house silicon rather than merchant GPUs.
- ROI Scrutiny: As cumulative AI investments approach the $1tn mark — with projections hitting $7.9tn in data center capex by 2030 — stakeholders are increasingly questioning whether the technology's value creation matches the colossal outlays. Reports highlight that 95% of generative AI pilots fail to deliver measurable ROI, fueling demands for clearer returns.
- Historical Parallels: Many investors see echoes of past excesses in the AI narrative, likening it to the dot-com bubble's hype-driven valuations and the 2008 housing crisis's overleveraged bets, though key differences exist, such as today's profitable incumbents versus the revenue-less dot-com startups.
Lately, the AI bubble debate has surged across major media platforms, with outlets like CNBC and Reuters citing explosive spending as a potential dot-com rerun. We've been analyzing AI bubble risks internally for over two years, maintaining a cautiously bullish outlook. While our vigilance has sharpened amid these developments, we still recommend AI investments — approached with extra caution to navigate the growing uncertainties.
Tech Fundamentals
Let's dive straight into the core driver — or first-principle element — of the AI story: the technology itself.
Currently, LLMs have established themselves as powerful productivity accelerators, particularly for repetitive, pattern-based workloads. In customer support, they can supplant 75% or more of the workforce by handling routine inquiries with high accuracy and speed. In software engineering, they deliver productivity gains of 30%-50% by automating code generation, debugging, and optimization, while nearly obviating the need for entry-level roles. That said, recent studies highlight variability: While some developers report time savings, others note tasks taking up to 20% longer due to over-reliance or integration hurdles, underscoring the importance of targeted application and user expertise.
These problems, however, seem to be an engineering problem that can be tackled via future model and application evolution. Undeniably, though, the value creation is immense — gauging its precise magnitude and growth curve remains elusive amid rapid evolution. Even conservatively tallying low-hanging fruits like customer support automation, coding assistance, targeted advertising, chatbots, and retrieval-augmented generation (RAG), AI's economic impact already surpasses $1tn, while optimistic projections reaching $4.4tn in added productivity from corporate use cases alone.
Moreover, emerging AI use cases are profoundly compute-intensive, amplifying demand exponentially. Key drivers include:
- Reasoning Models: These amplify output tokens by 10x-100x or more through multi-step logical inference, demanding heavier processing for complex problem-solving.
- Agentic and Tool-Use Workflows: By enabling iterative interactions, these can escalate input and output tokens by 100x+, depending on task complexity and rounds required. If one master agent can run multiple sub-agents working 24/7, then it is not impossible that these machine-led output token generation will be consuming several orders of magnitude more tokens than human beings do.
- Video Generation: For extended, high-definition content, this introduces another 100x compute multiplier — exacerbated by reasoning loops and iterative refinements to produce usable outputs. At the moment, AI generated videos are mostly shorter than one minute, and they are already pressurizing the inference compute. If we have 1-hour long 4K videos, then computer consumption will be mind-blowing.
- 3D real-time video games: Scaling to fully interactive, generative environments could drive compute demand 1,000,000x or more, combining real-time rendering, physics simulation, and adaptive AI behaviors. In practice, this means sustained, continuous video-generation workloads for hours of gameplay or VR use. However, output tokens may be produced far more efficiently through KV-caching and other optimizations, meaning actual compute demand may be lower than the headline multiplier suggests.
Equally pivotal are efficiency breakthroughs from Chinese open-source labs, which demand serious attention amid their rapid pace. As we detailed in our earlier multi-part DeepSeek report this year, a 10x cost reduction can trigger disproportionately amplified demand by democratizing access. Labs like Qwen and DeepSeek are directly confronting the O(N²) complexity baked into traditional LLM architectures, where compute escalates quadratically (or worse) with input size — exacerbating costs for long-context tasks.