Skip to main content

Google TPU v8 Explained: Training vs Inference Split

·600 words·3 mins
TPU Google Cloud AI Hardware Machine Learning Data Center
Table of Contents

Google TPU v8 Explained: Training vs Inference Split

As of April 22, 2026, Google officially unveiled its eighth-generation TPU (v8) at Google Cloud Next.

For the first time, Google has split its TPU roadmap into two specialized chips:

  • TPU 8t (Sunfish) for large-scale training
  • TPU 8i (Zebrafish) for high-efficiency inference

This architectural shift reflects a deeper industry transition: modern AI workloads—especially agentic AI swarms and trillion-parameter models—no longer fit a “one-size-fits-all” accelerator design. The previous generation, TPU v7 (Ironwood), began to show limits under these emerging workloads.


🚀 The Two-Pronged TPU Strategy: 8t vs. 8i
#

Google’s TPU v8 marks a decisive move toward specialization—separating training and inference into independently optimized systems.

Feature TPU 8t (Training) TPU 8i (Inference)
Codename Sunfish Zebrafish
Core Design High-throughput (Broadcom partner) Cost-efficient (MediaTek partner)
Performance ~2.8× over v7 ~80% over v7
Topology 3D Torus (massive clusters) Boardfly (low-latency, high-radix)
Scale Up to 9,600 chips (121 ExaFlops) Optimized for agent swarms

Instead of forcing a compromise, Google now optimizes:

  • Throughput and scale → training (8t)
  • Latency and efficiency → inference (8i)

⚙️ Technical Innovations: Breaking the Memory Wall
#

Both TPU v8 variants are built around a vertically integrated stack, including Google’s custom Axion ARM CPUs, enabling tighter coupling between compute, memory, and networking.

TPU 8t: The Training Powerhouse
#

The 8t (Sunfish) is designed for extreme-scale distributed training:

  • Massive Interconnect Bandwidth
    Inter-chip interconnect (ICI) bandwidth is doubled, while TPUDirect boosts storage access speeds by 10× over v7.

  • Virgo Network Architecture
    A new network fabric enables scaling to 1 million chips in a single logical cluster, with near-linear scaling efficiency.

  • Autonomous Reconfiguration
    Using Optical Circuit Switching (OCS), the system dynamically reroutes around failures—allowing long-running training jobs to continue uninterrupted.

This makes 8t particularly suited for frontier model training where uptime and scaling efficiency are critical.


TPU 8i: The Inference & Reasoning Engine
#

The 8i (Zebrafish) is purpose-built for modern inference workloads, especially reasoning-heavy models:

  • Boardfly Topology
    A hierarchical high-radix network that reduces hop count by over 50%, cutting all-to-all latency by ~50%—a key requirement for Mixture-of-Experts (MoE) models.

  • Massive On-Chip SRAM (384MB)
    Roughly 3× larger than v7, enabling full KV cache residency on-chip, effectively eliminating memory bottlenecks during inference.

  • Collectives Acceleration Engine (CAE)
    A dedicated hardware block for global operations, reducing latency for reasoning workflows (e.g., chain-of-thought) by up to .

This design directly targets real-time AI systems where latency—not raw FLOPs—is the bottleneck.


⚡ Power and Efficiency in the TPU v8 Era
#

Efficiency is no longer optional at hyperscale—it’s foundational. TPU v8 delivers a 2× performance-per-watt improvement over v7 through:

  • Axion CPU Integration
    Custom ARM-based CPUs enable system-level NUMA optimization, improving memory locality and reducing overhead.

  • Liquid Cooling v4
    Advanced liquid cooling supports significantly higher power densities than traditional air-cooled systems.

  • Real-Time Power Management
    Hardware dynamically adjusts power usage based on workload phases (training, inference, communication), minimizing waste.


🤖 The Bigger Picture: Enter the Agentic AI Era
#

With TPU v8, Google is clearly aligning its infrastructure with the rise of agentic AI systems—distributed, collaborative AI agents operating at scale.

By offering:

  • Bare-metal access
  • Native support for frameworks like SGLang, vLLM, and JAX

Google positions TPU v8 as a direct competitor to next-generation GPU architectures such as NVIDIA’s Rubin platform.


🧠 Final Takeaway
#

TPU v8 isn’t just a performance upgrade—it’s a philosophical shift in AI hardware design:

  • Training and inference are now fundamentally different problems
  • Specialized silicon delivers better efficiency than general-purpose accelerators
  • Infrastructure is evolving to support AI systems, not just models

In short, Google’s TPU v8 signals the transition from the model-centric era to the agent-centric era of computing.

Related

CPU vs GPU vs TPU in 2026: How Google Trillium Redefines AI Compute
·655 words·4 mins
CPU GPU TPU Google Trillium AI Hardware Data Center ASIC Machine Learning
Google Custom Chips Explained: Axion ARM CPU and TPU v6 Trillium
·577 words·3 mins
Google Cloud ARM TPU AI Hardware Data Center
Mastering SSD Tail Latency with Predictive Neural Scheduling
·742 words·4 mins
Data Center Storage SSD Machine Learning Operating Systems