DDR Memory Fundamentals: Architecture, Prefetch, and Addressing

Table of Contents

🧠 Why DDR Architecture Matters
#

DDR (Double Data Rate) memory is the performance backbone of modern CPUs, GPUs, SoCs, and FPGA platforms. While external I/O speeds have increased dramatically from SDR to DDR5, internal DRAM core frequencies have scaled much more conservatively.

This gap is bridged through architectural innovations such as Prefetch, Burst Transfers, and Bank Group interleaving, allowing DDR to deliver extreme bandwidth without violating power and signal integrity constraints.

🚀 Evolution of DDR: Speed Through Prefetch
#

The fundamental challenge in DRAM design is balancing I/O bandwidth with core operating frequency. DDR achieves this through progressively wider prefetch architectures.

SDR → DDR → DDR3: Prefetch depth increased from 1n → 2n → 4n → 8n
Conceptually: Prefetch creates multiple parallel data paths between the slow DRAM core and fast I/O pins
Each internal access fetches multiple bits that are serialized onto the external bus

Why Prefetch Couldn’t Scale Forever
#

Increasing prefetch beyond 8n would exceed common CPU cache-line sizes, causing wasted transfers and inefficiency.

DDR4 / DDR5 Solution: Bank Groups
#

Instead of increasing prefetch further, DDR4 introduced Bank Groups:

Banks are partitioned into groups
Commands to different groups can overlap
Internal timing constraints are relaxed across groups

This allows higher sustained bandwidth without enlarging the prefetch window.

🏗️ Memory Hierarchy: From Channel to Column
#

DDR memory is organized hierarchically, both logically and physically:

Channel → DIMM → Rank → Chip → Bank → Row / Column

Key Concepts
#

DIMM: The physical memory module
Rank: A set of chips accessed in parallel to form a 64-bit data word
- 1R = single rank
- 2R = dual rank
Chip: Individual DRAM ICs on the module
Bank: Independent 2D memory arrays inside each chip
Row / Column: The fundamental cell matrix inside a bank

Understanding this hierarchy is essential for performance tuning, timing closure, and controller design.

🧭 Addressing Mechanism: Time-Multiplexed Magic
#

A DDR4 device may expose only ~20 address pins, yet it can address gigabytes of memory. This is achieved through time-multiplexed addressing.

Step-by-Step Access Flow
#

Bank Selection
- BG (Bank Group) and BA (Bank Address) select the target bank
Row Address (RAS Phase)
- The row address is sent first
- The entire row (page) is activated into sense amplifiers
- Typical page size ≈ 2 KB
Column Address (CAS Phase)
- Column address is sent later using the same pins
- Selects the exact data within the open row

This reuse of address pins dramatically reduces package complexity.

📦 Burst Length and the “Missing” Column Bits
#

A common point of confusion is column addressing capacity.

If a bank has 10 column bits (2¹⁰ = 1024 columns), why do datasheets often list only 128 addressable columns?

The Answer: Burst Mode
#

Modern DDR always transfers data in bursts, typically BL8.

CA[9:3] (7 bits): Select the 128 column blocks
CA[2:0] (3 bits): Define the burst offset and ordering

The lower column bits are implicit in the burst operation and never explicitly transmitted.

🧩 Key Takeaways for FPGA and System Designers
#

DDR transfers data on both clock edges
High bandwidth comes from prefetch + bank group interleaving
DRAM access always follows ACTIVATE → READ/WRITE → PRECHARGE
Address pins are time-multiplexed, not fully parallel
Efficient memory access depends heavily on row locality

A strong grasp of these fundamentals is critical when configuring DDR controllers, tuning timing parameters, or debugging performance issues on FPGA and SoC platforms.