HBF: The Next Memory Layer for AI Accelerators
The explosive growth of AI workloads is pushing memory architectures to their limits. As models scale and inference workloads demand ever-larger datasets, traditional memory solutions are struggling to keep pace.
Industry leaders—including NVIDIA, AMD, and Google—are exploring a new approach: HBF (High-Bandwidth Flash). This technology introduces a new memory tier designed to complement High-Bandwidth Memory (HBM) with dramatically larger capacity while maintaining relatively high data throughput.
🧩 The HBM–HBF Tiered Memory Architecture #
HBM (High-Bandwidth Memory) currently acts as the ultra-fast working memory for GPUs and AI accelerators. It provides extremely high bandwidth for operations such as reading and processing KV (Key–Value) cache data in large language models.
However, HBM has two major limitations:
- High cost
- Limited capacity
HBF aims to solve this by introducing a high-capacity flash-based memory tier directly connected to the accelerator.
A Simple Analogy #
Professor Kim Joungho of KAIST compares the relationship between HBM and HBF to a library system:
- HBM: Like a small bookshelf at home—very fast and easy to access.
- HBF: Like a massive library—slightly slower, but able to store vastly more information.
In practical terms:
- HBF capacity: ~10× larger than HBM
- HBF speed: Lower than DRAM but significantly faster than traditional storage systems
This layered architecture enables accelerators to balance speed and scale.
⚙️ Technical Design and Performance #
HBF adopts a stacked architecture similar to HBM but uses 3D NAND flash instead of DRAM.
Multiple NAND layers are vertically integrated and connected using TSV (Through-Silicon Via) technology.
Key characteristics include:
- Stacked 3D NAND layers
- TSV vertical interconnects
- Integrated logic die at the base
Performance Characteristics #
Typical design targets include:
- Capacity: Up to 512 GB per HBF unit
- Bandwidth: Up to 1.638 TB/s
- Form factor: Directly integrated near the accelerator
This bandwidth dramatically exceeds conventional SSD performance.
For comparison:
| Storage Type | Typical Bandwidth |
|---|---|
| PCIe 4.0 NVMe SSD | ~7 GB/s |
| PCIe 5.0 NVMe SSD | ~14 GB/s |
| HBF Stack | Up to ~1.6 TB/s |
Major manufacturers—including SK hynix and SanDisk—have demonstrated prototype designs where NAND stacks connect to a base logic die to create a fully integrated storage module.
🔁 A Shift Toward Read-Centric Software #
Because HBF relies on flash memory, it introduces a new constraint: limited write endurance.
Typical HBF flash cells support approximately:
- ~100,000 write cycles
However:
- Read operations have effectively no limit
Implications for AI Software #
This endurance constraint requires AI frameworks to adapt.
Future accelerator software will likely adopt a read-heavy memory model, where:
- Model parameters and KV caches are frequently read
- Writes are minimized and carefully managed
For example, during inference:
- The accelerator retrieves KV cache data from HBM or HBF.
- The model processes tokens sequentially.
- Output tokens are generated word-by-word.
Because inference workloads are naturally read-dominant, they align well with HBF’s characteristics.
🚀 Future Roadmap: HBM6 and Beyond #
HBF is expected to enter the market alongside next-generation HBM technologies.
HBM6 Era Integration #
During the HBM6 generation:
- Multiple HBM stacks will form the high-speed compute memory layer.
- HBF modules will provide large-scale storage close to the accelerator.
This creates a multi-tier accelerator memory hierarchy.
Toward the “Storage Factory” #
Future generations (often referred to conceptually as HBM7) envision a system where accelerators access a distributed storage pool sometimes described as a “Storage Factory.”
In such architectures:
- Data could be processed directly within storage modules
- Intermediate storage networks may be bypassed
- Latency between compute and data could shrink dramatically
Early Industry Milestones #
Kioxia has already demonstrated a 5 TB HBF prototype module using:
- PCIe Gen6 x8
- 64 Gbps transfer rates
These early prototypes hint at the massive scale possible for future AI memory systems.
🏭 Manufacturing Challenges #
Building HBF stacks is technically complex.
Key manufacturing challenges include:
- Wafer warpage control at the base die
- Micro-bump interconnect density
- Thermal management for dense NAND stacks
- TSV reliability across multiple layers
As NAND layer counts increase, maintaining mechanical stability and yield becomes increasingly difficult.
📈 Market Outlook #
Major semiconductor manufacturers are moving quickly to commercialize HBF.
Current projections suggest:
- 24-month development window for early integration
- Initial deployment in AI accelerators from companies such as NVIDIA, AMD, and Google
- Potential introduction alongside next-generation HBM platforms
Long-term forecasts are even more ambitious.
Professor Kim Joungho predicts that by 2038, the HBF market could surpass the HBM market, as capacity becomes the dominant constraint in scaling AI systems.
🧠 The Bigger Picture #
The future of AI computing will depend on balancing three resources:
- Compute
- Memory bandwidth
- Memory capacity
HBM solved the bandwidth challenge. HBF may solve the capacity problem.
Together, they represent the next step toward a multi-tier memory hierarchy optimized for AI-scale workloads.