The Memory Hierarchy
To this point in our study of systems, we have relied on a simple model of a computer system as a CPU
that executes instructions and a memory system that holds instructions and data for the CPU. In our simple
model, the memory system is a linear array of bytes, and the CPU can access each memory location in a
constant amount of time. While this is aneffective model as far as it goes, it does not reﬂect the way that
modern systems really work.
In practice, a memory system is a hierarchy of storage devices with different capacities, costs, and access
times. CPU registers hold the most frequently used data. Small, fast cache memories nearby the CPU act as
staging areas for a subset of the data and instructions stored in the relatively slow mainmemory. The main
memory stages data stored on large, slow disks, which in turn often serve as staging areas for data stored on
the disks or tapes of other machines connected by networks.
Memory hierarchies work because well-written programs tend to access the storage at any particular level
more frequently than they access the storage at the next lower level. So the storage at the next level canbe
slower, and thus larger and cheaper per bit. The overall effect is a large pool of memory that costs as much
as the cheap storage near the bottom of the hierarchy, but that serves data to programs at the rate of the fast
storage near the top of the hierarchy.
As a programmer, you need to understand the memory hierarchy because it has a big impact on the performance of your applications. Ifthe data your program needs are stored in a CPU register, then they can be
accessed in zero cycles during the execution of the instruction. If stored in a cache, 1 to 30 cycles. If stored
in main memory, 50 to 200 cycles. And if stored in disk tens of millions of cycles!
Here, then, is a fundamental and enduring idea in computer systems: If you understand how the system
moves data up and downthe memory hierarchy, then you can write your application programs so that their
data items are stored higher in the hierarchy, where the CPU can access them more quickly.
This idea centers around a fundamental property of computer programs known as locality. Programs with
good locality tend to access the same set of data items over and over again, or they tend to access sets of
nearby dataitems. Programs with good locality tend to access more data items from the upper levels of the
memory hierarchy than programs with poor locality, and thus run faster. For example, the running times
of different matrix multiplication kernels that perform the same number of arithmetic operations, but have
different degrees of locality, can vary by a factor of 20!
CHAPTER 6. THE MEMORYHIERARCHY
In this chapter, we will look at the basic storage technologies — SRAM memory, DRAM memory, ROM
memory, and rotating and solid state disks — and describe how they are organized into hierarchies. In
particular, we focus on the cache memories that act as staging areas between the CPU and main memory,
because they have the most impact on application program performance. We show youhow to analyze
your C programs for locality and we introduce techniques for improving the locality in your programs. You
will also learn an interesting way to characterize the performance of the memory hierarchy on a particular
machine as a “memory mountain” that shows read access times as a function of locality.
6.1 Storage Technologies
Much of the success of computer technology stems fromthe tremendous progress in storage technology.
Early computers had a few kilobytes of random-access memory. The earliest IBM PCs didn’t even have a
hard disk. That changed with the introduction of the IBM PC-XT in 1982, with its 10-megabyte disk. By
the year 2010, typical machines had 150,000 times as much disk storage, and the amount of storage was
increasing by a factor of 2 every couple...