Memory bandwidth

Memory bandwidth

Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used 8-bit bytes.

Memory bandwidth that is advertised for a given memory or system is usually the maximum theoretical bandwidth. In practice the observed memory bandwidth will be less than (and is guaranteed not to exceed) the advertised bandwidth. A variety of computer benchmarks exist to measure sustained memory bandwidth using a variety of access patterns. These are intended to provide insight into the memory bandwidth that a system should sustain on various classes of real applications.

Contents

Conventions

Perhaps surprisingly, there are at least three different conventions for counting the quantity of data transferred in the numerator of bytes/second, as discussed in more detail in [1].

  1. bcopy convention: counts the amount of data copied from one location in memory to another location per unit time. For example, copying 1 million bytes from one location in memory to another location in memory in one second would be counted as 1 million bytes per second.
  2. STREAM convention: sums the amount of data that the application code explicitly reads plus the amount of data that the application code explicitly writes. Using the previous 1 million byte copy example, the STREAM bandwidth would be counted as 1 million bytes read plus 1 million bytes written in one second, for a total of 2 million bytes per second.
  3. hardware convention: counts the actual amount of data read or written by the hardware, whether the data motion was explicitly requested by the user code or not. Using the same 1 million byte copy example, the hardware bandwidth on computer systems with a write allocate cache policy would include an additional 1 million bytes of traffic because the hardware reads the target array from memory into cache before performing the stores. This gives a total of 3 million bytes per second actually transferred by the hardware.

The bcopy convention is self-consistent, but is not easily extended to cover cases with more complex access patterns, for example three reads and one write. The STREAM convention is most directly tied to the user code, but may not count all the data traffic that the hardware is actually required to perform. The hardware convention is most directly tied to the hardware, but may not represent the minimum amount of data traffic required to implement the user's code. For example, some computer systems have the ability to avoid write allocate traffic using special instructions, leading to the possibility of misleading comparisons of bandwidth based on different amounts of data traffic performed.

Computation

Theoretical maximum memory bandwidth is typically computed by multiplying the width of the interface by the frequency at which it transfers data. This is also referred to as the burst rate of the interface, in recognition of the possibility that this rate may not be sustainable over long periods (i.e., the throughput may be less than the theoretical maximum memory bandwidth).

The nomenclature standards often differ across memory technologies, but for commodity DDR SDRAM, DDR2 SDRAM, and DDR3 SDRAM memory the computation is:

  • Base DRAM frequency in MHz (millions of DRAM clock cycles per second).
  • Memory interface (or bus) width. Each standard DDR, DDR2, or DDR3 memory interface is 64 bits (8 bytes) wide. (The width is sometimes referred to in lines or lanes, rather than bits, though these are synonymous here.)
  • Number of interfaces. Current computers typically use two memory interfaces in dual-channel mode for an effective 128-bit width.
  • Number of bits per clock cycle per line. This is 2 for DDR, DDR2, and DDR3 dual data rate technologies.

So a recent computer system with a dual-channel configuration and two DDR2-800 modules, each running at 400 MHz (actual bus speed, which is half of the nominal speed of 800 MHz, but in DDR2 is twice the memory's actual clock of 400 MHz), would have a theoretical maximum memory bandwidth of:

  • (400 million hertz * (2 interfaces) * (64 lines/interface) * (2 bits/line-cycle)) = 102,400 Mbit/s, or 12,800 MB/s, or 12.8 GB/s.

The naming conventions of DDR, DDR2 and DDR3 modules typically cite a nominal MHz rating (e.g., DDR2-1066) which is not the bus speed or memory speed, but the number of transfers possible per second, and an additional nominal rating of the maximum throughput of the module (e.g., DDR2-800 is also called PC2-6400) which reflects the theoretical maximum bandwidth in mebibytes per second. So with this in mind, the above computation can be simplified as having two PC2-6400 modules in a dual-channel 128-bit configuration, or 2 × 6,400 MiB/s.

The choice of two memory interfaces in the above example is a common configuration, but single-channel configurations are common in low-end and low-power devices, and more than two channels are used in some high-performance systems. As of 2007, advanced personal computers and graphics cards use even more combined buses than dual-channel, and combine four (e.g., Mac Pro), five (e.g., nVidia 8800GTS), six (e.g., nVidia 8800GTX), or more sets of 64-bit memory modules and buses to reach 256-bit, 320-bit, 384-bit or greater total memory bus width. In this sort of multi-channel configuration, memory must be broken out so that there is at least one 64-bit wide chip or module for each channel. So for a 256-bit wide 4 GiB configuration with DDR2 modules, one must have 4×1 GiB modules (or 8x512 MiB, 16x256 MiB, etc.) since each of these standard modules provides only a 64-bit interface.

Note that in systems with error-correcting memory, the additional width of the interfaces (typically 72 bits rather than 64 bits) is not counted in the bandwidth computations, as neither the extra memory nor the extra bandwidth is available for user data.

See also

A major factors in real world performance of Random Access Memory systems:

Further reading on semiconductor memory:

References

  1. ^ STREAM Benchmark FAQ: Counting Bytes and FLOPS: http://www.cs.virginia.edu/stream/ref.html#counting

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Memory timings — (or RAM timings) refer collectively to a set of four numerical parameters called CL, tRCD, tRP, and tRAS, commonly represented as a series of four numbers separated with dashes, in that respective order (e.g. 5 5 5 15). However, it is not unusual …   Wikipedia

  • Memory latency — In computing, memory latency is the time between initiating a request for a byte or word in memory until it is retrieved. If the data are not in the processor s cache, it takes longer to obtain them, as the processor will have to communicate with …   Wikipedia

  • Bandwidth-delay product — In data communications, bandwidth delay product refers to the product of a data link s capacity (in bits per second) and its end to end delay (in seconds). The result, an amount of data measured in bits (or bytes), is equivalent to the amount of… …   Wikipedia

  • Memory hierarchy — Diagram of the computer memory hierarchy See also: Computer data storage The term memory hierarchy is used in the theory of computation when discussing performance issues in computer architectural design, algorithm predictions, and the lower… …   Wikipedia

  • Memory controller — The memory controller is a digital circuit which manages the flow of data going to and from the main memory. It can be a separate chip or integrated into another chip, such as on the die of a microprocessor. This is also called a Memory Chip… …   Wikipedia

  • Multi-channel memory architecture — Multi channel architecture is a technology that increases the transfer speed of data between the RAM and the memory controller by adding more channels of communication between them. Theoretically this multiplies the data rate by exactly the… …   Wikipedia

  • Dynamic random-access memory — DRAM redirects here. For other uses, see Dram (disambiguation). Computer memory types Volatile RAM DRAM (e.g., DDR SDRAM) SRAM In development T RAM Z RAM TTRAM Historical Delay line memory Selectron tube Williams tube …   Wikipedia

  • Flash memory — Computer memory types Volatile RAM DRAM (e.g., DDR SDRAM) SRAM In development T RAM Z RAM TTRAM Historical Delay line memory Selectron tube Williams tube Non volatile …   Wikipedia

  • Dynamic random access memory — (DRAM) is a type of random access memory that stores each bit of data in a separate capacitor within an integrated circuit. Since real capacitors leak charge, the information eventually fades unless the capacitor charge is refreshed periodically …   Wikipedia

  • Random-access memory — RAM redirects here. For other uses of the word, see Ram (disambiguation). Example of writable volatile random access memory: Synchronous Dynamic RAM modules, primarily used as main memory in personal computers, workstations, and servers …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”