Skip to main content

HBF: The Next Memory Layer for AI Accelerators

·774 words·4 mins
AI Infrastructure Memory Technology GPU Architecture HBM Semiconductor
Table of Contents

HBF: The Next Memory Layer for AI Accelerators

The explosive growth of AI workloads is pushing memory architectures to their limits. As models scale and inference workloads demand ever-larger datasets, traditional memory solutions are struggling to keep pace.

Industry leaders—including NVIDIA, AMD, and Google—are exploring a new approach: HBF (High-Bandwidth Flash). This technology introduces a new memory tier designed to complement High-Bandwidth Memory (HBM) with dramatically larger capacity while maintaining relatively high data throughput.


🧩 The HBM–HBF Tiered Memory Architecture
#

HBM (High-Bandwidth Memory) currently acts as the ultra-fast working memory for GPUs and AI accelerators. It provides extremely high bandwidth for operations such as reading and processing KV (Key–Value) cache data in large language models.

However, HBM has two major limitations:

  • High cost
  • Limited capacity

HBF aims to solve this by introducing a high-capacity flash-based memory tier directly connected to the accelerator.

A Simple Analogy
#

Professor Kim Joungho of KAIST compares the relationship between HBM and HBF to a library system:

  • HBM: Like a small bookshelf at home—very fast and easy to access.
  • HBF: Like a massive library—slightly slower, but able to store vastly more information.

In practical terms:

  • HBF capacity: ~10× larger than HBM
  • HBF speed: Lower than DRAM but significantly faster than traditional storage systems

This layered architecture enables accelerators to balance speed and scale.


⚙️ Technical Design and Performance
#

HBF adopts a stacked architecture similar to HBM but uses 3D NAND flash instead of DRAM.

Multiple NAND layers are vertically integrated and connected using TSV (Through-Silicon Via) technology.

Key characteristics include:

  • Stacked 3D NAND layers
  • TSV vertical interconnects
  • Integrated logic die at the base

Performance Characteristics
#

Typical design targets include:

  • Capacity: Up to 512 GB per HBF unit
  • Bandwidth: Up to 1.638 TB/s
  • Form factor: Directly integrated near the accelerator

This bandwidth dramatically exceeds conventional SSD performance.

For comparison:

Storage Type Typical Bandwidth
PCIe 4.0 NVMe SSD ~7 GB/s
PCIe 5.0 NVMe SSD ~14 GB/s
HBF Stack Up to ~1.6 TB/s

Major manufacturers—including SK hynix and SanDisk—have demonstrated prototype designs where NAND stacks connect to a base logic die to create a fully integrated storage module.


🔁 A Shift Toward Read-Centric Software
#

Because HBF relies on flash memory, it introduces a new constraint: limited write endurance.

Typical HBF flash cells support approximately:

  • ~100,000 write cycles

However:

  • Read operations have effectively no limit

Implications for AI Software
#

This endurance constraint requires AI frameworks to adapt.

Future accelerator software will likely adopt a read-heavy memory model, where:

  • Model parameters and KV caches are frequently read
  • Writes are minimized and carefully managed

For example, during inference:

  1. The accelerator retrieves KV cache data from HBM or HBF.
  2. The model processes tokens sequentially.
  3. Output tokens are generated word-by-word.

Because inference workloads are naturally read-dominant, they align well with HBF’s characteristics.


🚀 Future Roadmap: HBM6 and Beyond
#

HBF is expected to enter the market alongside next-generation HBM technologies.

HBM6 Era Integration
#

During the HBM6 generation:

  • Multiple HBM stacks will form the high-speed compute memory layer.
  • HBF modules will provide large-scale storage close to the accelerator.

This creates a multi-tier accelerator memory hierarchy.

Toward the “Storage Factory”
#

Future generations (often referred to conceptually as HBM7) envision a system where accelerators access a distributed storage pool sometimes described as a “Storage Factory.”

In such architectures:

  • Data could be processed directly within storage modules
  • Intermediate storage networks may be bypassed
  • Latency between compute and data could shrink dramatically

Early Industry Milestones
#

Kioxia has already demonstrated a 5 TB HBF prototype module using:

  • PCIe Gen6 x8
  • 64 Gbps transfer rates

These early prototypes hint at the massive scale possible for future AI memory systems.


🏭 Manufacturing Challenges
#

Building HBF stacks is technically complex.

Key manufacturing challenges include:

  • Wafer warpage control at the base die
  • Micro-bump interconnect density
  • Thermal management for dense NAND stacks
  • TSV reliability across multiple layers

As NAND layer counts increase, maintaining mechanical stability and yield becomes increasingly difficult.


📈 Market Outlook
#

Major semiconductor manufacturers are moving quickly to commercialize HBF.

Current projections suggest:

  • 24-month development window for early integration
  • Initial deployment in AI accelerators from companies such as NVIDIA, AMD, and Google
  • Potential introduction alongside next-generation HBM platforms

Long-term forecasts are even more ambitious.

Professor Kim Joungho predicts that by 2038, the HBF market could surpass the HBM market, as capacity becomes the dominant constraint in scaling AI systems.


🧠 The Bigger Picture
#

The future of AI computing will depend on balancing three resources:

  • Compute
  • Memory bandwidth
  • Memory capacity

HBM solved the bandwidth challenge. HBF may solve the capacity problem.

Together, they represent the next step toward a multi-tier memory hierarchy optimized for AI-scale workloads.

Related

Beyond HBM: Next-Generation Memory Technologies for the AI Era
·764 words·4 mins
Semiconductors Memory Technology AI Infrastructure
NVIDIA Invests $5B in Intel: From Rivals to Strategic Allies
·587 words·3 mins
Hardware Semiconductor AI Infrastructure
NVIDIA GPU vs Google TPU vs AWS Trainium: AI Chip Paths Compared
·700 words·4 mins
Hardware AI Infrastructure Semiconductor Cloud Computing