At CES 2026, NVIDIA CEO Jensen Huang officially unveiled the Vera Rubin platform—named after the astronomer who uncovered evidence of dark matter. Rubin is not a single GPU generation but a full-stack AI system engineered to address the exploding scale of modern AI models, agentic reasoning, and long-context inference.
With Rubin, NVIDIA moves decisively beyond chip-level optimization toward rack-scale, system-level computing, redefining how AI infrastructure is built, deployed, and monetized.
🧠 The Rubin Full-Stack Architecture #
The Vera Rubin platform is composed of six tightly co-designed chips that operate as a single coherent compute system. This approach allows NVIDIA to break scaling limits that individual chips can no longer overcome alone.
Rubin GPU: The Compute Core #
The Rubin GPU is the primary accelerator for training and inference.
- AI Performance
- 50 PFLOPS inference (NVFP4) — 5× Blackwell
- 35 PFLOPS training — 3.5× Blackwell
- Memory Subsystem
- 288 GB HBM4
- 22 TB/s bandwidth (≈2.8× prior generation)
- Transformer Engine
- 3rd-generation design
- Hardware-accelerated adaptive compression, identified by Huang as one of the platform’s “six technical wonders”
Rubin is explicitly optimized for transformer-heavy, attention-bound workloads, not traditional HPC-style FP64 computation.
Vera CPU: The System Brain #
Complementing the GPU is the Vera CPU, a custom Arm-based processor.
- Architecture
- Custom “Olympus” cores
- Arm v9.2-A
- Core Count
- 88 physical cores
- 176 threads via Spatial Multi-threading
- Memory
- Up to 1.5 TB LPDDR5X via modular SOCAMM
- 1.2 TB/s memory bandwidth (≈3× Grace)
The Vera CPU focuses on orchestration, scheduling, and feeding accelerators efficiently at rack scale.
Networking and Interconnect Fabric #
Rubin’s performance leap depends as much on interconnect as on raw compute.
-
NVLink 6
- 3.6 TB/s GPU-to-GPU bandwidth (2× Blackwell)
- 3.6 TB/s GPU-to-GPU bandwidth (2× Blackwell)
-
ConnectX-9 SuperNIC
- 1.6 TB/s networking throughput
- 1.6 TB/s networking throughput
-
BlueField-4 DPU
- Integrates 64 Grace-class cores
- Doubles memory bandwidth versus BlueField-3
-
Spectrum-X Ethernet (CPO)
- First Ethernet switch with Co-Packaged Optics
- 512 × 200 Gb/s ports
- Dramatically reduced power per bit
Together, these components transform the rack into a single, low-latency accelerator domain.
🏗️ NVL72: Rack-Scale System Integration #
The primary deployment unit for Rubin is the Vera Rubin NVL72 rack.
- Configuration
- 72 Rubin GPUs
- 36 Vera CPUs
- Aggregate Performance
- 3.6 EFLOPS inference
- 2.5 EFLOPS training
- Mechanical Design
- Cable-free, tray-based architecture
- Fully liquid-cooled with 45°C warm water
- Operational Impact
- Rack assembly time reduced from ~100 minutes to 6 minutes
- ~6% total data center energy savings by eliminating chillers
NVL72 behaves less like a cluster and more like a monolithic AI super-accelerator.
🧩 Breakthrough: Inference Context Memory #
Agentic and reasoning-focused AI models generate enormous KV (key–value) caches, which quickly exceed local GPU memory.
Rubin introduces a new tier: Inference Context Memory Storage.
- Architecture
- Managed by BlueField-4 DPUs
- Shared at rack level
- Capacity
- Up to 150 TB of context memory per rack
- Up to 16 TB addressable by a single GPU
- Impact
- Eliminates repeated recomputation
- Enables ultra-long conversations, large retrieval sets, and persistent agent memory
This effectively gives each GPU an external cognitive workspace far larger than on-package HBM.
💰 Business and Market Impact #
Jensen Huang summarized Rubin’s value proposition with three headline metrics:
- Training Efficiency
- 4× fewer GPUs required for next-generation MoE model training versus Blackwell
- Inference Economics
- Up to 90% lower cost per token (≈10× reduction)
- Revenue Density
- NVIDIA claims $5B in token revenue potential per $100M invested in Rubin infrastructure
Blackwell vs. Rubin Summary #
| Metric | Blackwell (2024/25) | Vera Rubin (2026) |
|---|---|---|
| GPU Transistors | 208B | 336B |
| HBM Bandwidth | 8 TB/s | 22 TB/s |
| NVLink Bandwidth | 1.8 TB/s | 3.6 TB/s |
| AI Inference | 1× | 5× |
🚀 What Rubin Signals for the Industry #
Vera Rubin marks NVIDIA’s transition from GPU vendor to AI infrastructure architect. The platform prioritizes:
- Inference and reasoning over raw training FLOPS
- Memory scale over clock speed
- Rack-level coherence over node-level optimization
With mass production planned for H2 2026, Rubin sets the template for how Physical AI, agentic systems, and large-scale inference will be deployed for the rest of the decade.