🧠 Why DDR Architecture Matters #
DDR (Double Data Rate) memory is the performance backbone of modern CPUs, GPUs, SoCs, and FPGA platforms. While external I/O speeds have increased dramatically from SDR to DDR5, internal DRAM core frequencies have scaled much more conservatively.
This gap is bridged through architectural innovations such as Prefetch, Burst Transfers, and Bank Group interleaving, allowing DDR to deliver extreme bandwidth without violating power and signal integrity constraints.
🚀 Evolution of DDR: Speed Through Prefetch #
The fundamental challenge in DRAM design is balancing I/O bandwidth with core operating frequency. DDR achieves this through progressively wider prefetch architectures.
- SDR → DDR → DDR3: Prefetch depth increased from 1n → 2n → 4n → 8n
- Conceptually: Prefetch creates multiple parallel data paths between the slow DRAM core and fast I/O pins
- Each internal access fetches multiple bits that are serialized onto the external bus
Why Prefetch Couldn’t Scale Forever #
Increasing prefetch beyond 8n would exceed common CPU cache-line sizes, causing wasted transfers and inefficiency.
DDR4 / DDR5 Solution: Bank Groups #
Instead of increasing prefetch further, DDR4 introduced Bank Groups:
- Banks are partitioned into groups
- Commands to different groups can overlap
- Internal timing constraints are relaxed across groups
This allows higher sustained bandwidth without enlarging the prefetch window.
🏗️ Memory Hierarchy: From Channel to Column #
DDR memory is organized hierarchically, both logically and physically:
Channel → DIMM → Rank → Chip → Bank → Row / Column
Key Concepts #
- DIMM: The physical memory module
- Rank: A set of chips accessed in parallel to form a 64-bit data word
- 1R = single rank
- 2R = dual rank
- Chip: Individual DRAM ICs on the module
- Bank: Independent 2D memory arrays inside each chip
- Row / Column: The fundamental cell matrix inside a bank
Understanding this hierarchy is essential for performance tuning, timing closure, and controller design.
🧭 Addressing Mechanism: Time-Multiplexed Magic #
A DDR4 device may expose only ~20 address pins, yet it can address gigabytes of memory. This is achieved through time-multiplexed addressing.
Step-by-Step Access Flow #
- Bank Selection
- BG (Bank Group) and BA (Bank Address) select the target bank
- Row Address (RAS Phase)
- The row address is sent first
- The entire row (page) is activated into sense amplifiers
- Typical page size ≈ 2 KB
- Column Address (CAS Phase)
- Column address is sent later using the same pins
- Selects the exact data within the open row
This reuse of address pins dramatically reduces package complexity.
📦 Burst Length and the “Missing” Column Bits #
A common point of confusion is column addressing capacity.
If a bank has 10 column bits (2¹⁰ = 1024 columns), why do datasheets often list only 128 addressable columns?
The Answer: Burst Mode #
Modern DDR always transfers data in bursts, typically BL8.
- CA[9:3] (7 bits): Select the 128 column blocks
- CA[2:0] (3 bits): Define the burst offset and ordering
The lower column bits are implicit in the burst operation and never explicitly transmitted.
🧩 Key Takeaways for FPGA and System Designers #
- DDR transfers data on both clock edges
- High bandwidth comes from prefetch + bank group interleaving
- DRAM access always follows ACTIVATE → READ/WRITE → PRECHARGE
- Address pins are time-multiplexed, not fully parallel
- Efficient memory access depends heavily on row locality
A strong grasp of these fundamentals is critical when configuring DDR controllers, tuning timing parameters, or debugging performance issues on FPGA and SoC platforms.