内存墙：动态随机存取存储器的过去、现在和未来

Winners & Losers in the 3D DRAM Revolution: The world questions Moore's Law but it died over a decade ago. Focus is on logic but it also applies to DRAM. With the explosion of AI, the balance in the industry has been further upset. DRAM doesn't scale anymore, with density increasing just 2x in the last decade compared to 100x every decade in the glory days. High bandwidth memory (HBM) for accelerators is 3x or more expensive per GB than standard DDR5. The DRAM industry has hit a wall with compute improvements outpacing memory. There are many possible solutions to reaccelerate innovation in DRAM.
DRAM Primer: Working Memory: There are several types of memory in a computer. SRAM is the fastest but most expensive. NAND, hard-disk drives, and magnetic tape are cheap but slow. DRAM sits between SRAM and Flash, being fast enough and cheap enough. DRAM can make up half the cost of a non-AI server system but has been the slowest to scale in the past 10 years.
DRAM Primer: Basic Architecture: DRAM comprises an array of memory cells in a grid. Each cell stores one bit and uses a 1T1C structure (1 transistor and 1 capacitor). Wordlines control access to cells and bitlines connect cells. DRAM is a volatile memory that requires frequent refreshes. Sense amplifiers detect and amplify the small charges read from memory cells.
DRAM Primer: History (When DRAM Still Scaled): Modern DRAM is made possible by the 1T1C memory cell and the sense amplifier. Early DRAM used 3 transistors per cell. The sense amplifier was independently discovered by Robert Proebsting at Mostek in 1977 and became the market leader.
DRAM Primer: When DRAM Stopped Scaling: In the 21st century, logic has outpaced memory scaling. DRAM pricing dynamics have changed with slow density scaling and wild price swings. Since entering the 10-nm nodes, DRAM bit density has stagnated due to challenges in capacitors and sense amplifiers.
Short-Term Scaling: 4F2 and Vertical Channel Transistor: In the short-term, DRAM scaling will continue along its traditional roadmap with 2 innovations: 4F2 cell layout and vertical channel transistors (VCT). 4F2 describes the memory cell area in terms of the minimum feature size F. VCTs are necessary to fit the transistor and its contacts in the cell footprint. Samsung's process uses wafer bonding.
DRAM Primer: Current Variants: DDR5 delivers the highest memory capacity, LPDDR5X provides low-power operation, GDDR6X is focused on graphics applications, and HBM3/E prioritizes bandwidth and power efficiency. HBM is expensive due to die stacking and packaging challenges. LPDDR5X has advantages in power consumption and cost but has limitations.
HBM Roadmap: All leading AI GPUs now use HBM. HBM4 will have higher bandwidth and a different base die fabricated on FinFET processes. Custom HBM can enable other package architectures. The connection between the CPU, GPU or accelerator, and memory is in the base chip, and improving this connection is one possible avenue.
Emerging Memory: FeRAM is a promising memory for discrete applications but is complex to manufacture and not competitive at present. MRAM has shown high density at IEDM 2022 but has limitations. None of the alternative memories are well placed to challenge DRAM yet.
Compute In Memory: DRAM's architecture has hindered its performance. It depends on the host for control logic and uses an ancient half-duplex interface. There is massive potential in DRAM banks that is currently wasted due to interfaces. Implementing UCIe or Eliyan's Nulink standard could improve performance. Adding logic to a DRAM chip is a challenge but HBM's architecture is amenable to it.
3D DRAM: Basics: With a SemiAnalysis subscription, one can get access to newsletter articles and article discussions. Model access is not included and one needs to reach out to [mailto:sales@semianalysis.com] for institutional offerings.