Non-Uniform Memory Access

From Academic Kids

Non-Uniform Memory Access or Non-Uniform Memory Architecture (NUMA) is a computer memory architecture, used in multiprocessors, where the memory access time depends on the memory location. Under NUMA, a processor can access its own local memory faster than non-local memory (memory which is local to another processor or shared between processors).

NUMA architectures are the logical next step in scaling from Symmetric multiprocessing (SMP) architectures.


Basic concept

Modern CPUs are considerably faster than the main memory they are attached to. While CPU complexity has doubled every eighteen months, the famous Moore's Law, memory has increased at perhaps 15% over the same period. In the early days of high speed computing and supercomputers the CPU was generally slower than memory, the line being crossed in the 1970s. Since then the CPU's have increasingly been starved for data, stalled while they wait for memory accesses to complete.

Key to extracting high performance from a modern computer is to limit the number of memory accesses. For commodity processors, this means to use an ever increasing amount of high-speed cache memory, and using increasingly sophisticated algorithms to avoid a "cache miss". These improvements have generally been overwhelmed by the dramatic increase in size of the operating systems and applications being run on them.

Multi-processor systems make the problem considerably worse. Now it is possible for several processors to be starved at the same time, notably because only one processor can access memory at a time. Many supercomputer designs of the 1980s and 90s focussed on providing high-speed memory access as opposed to faster processors, allowing them to work on large data sets at speeds other systems could not approach.

NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems where the data is spread out, common for servers and similar applications, NUMA can improve the performance over a single shared memory roughly by the number of processors (or separate memory banks).

Of course not all data ends up being isolated to a single task, which means that the same data may be needed by more than one processor. To handle these cases, NUMA systems include additional hardware, or software, to move data between banks. This operation will slow down the processors attached to those banks, so the overall speed increase due to NUMA will depend heavily on the exact nature of the tasks being run on the system.

Cache coherence and NUMA

Nearly all CPU architectures use a small amount of very fast non-shared memory known as cache to exploit locality of reference in memory accesses. With NUMA, maintaining cache coherence across shared memory has a significant overhead.

Although simpler to design and build, non-cache-coherent NUMA systems are prohibitively complex to program in the standard von Neumann programming model. As a result, all fielded NUMA designs use special-purpose hardware to maintain cache coherence, and are thus classed as "cache-coherent NUMA", or ccNUMA.

This is typically done by using inter-processor communication between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache. For this reason, ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession. Operating system support for NUMA attempts to reduce the frequency of this kind of access, by allocating processors and memory in NUMA-friendly ways, and by avoiding scheduling and locking algorithms that make unnecessary NUMA-unfriendly accesses.

NUMA vs. cluster computing

NUMA can be viewed as a very tightly coupled form of cluster computing. The addition of virtual memory paging to a cluster architecture can allow NUMA to be implemented entirely in software where no NUMA hardware exists. However, the inter-node latency of software-based NUMA is several orders of magnitude greater than with hardware NUMA.

See also


This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

External links

pl:Non-Uniform Memory Access uk:NUMA zh:非均匀访存模型


Academic Kids Menu

  • Art and Cultures
    • Art (
    • Architecture (
    • Cultures (
    • Music (
    • Musical Instruments (
  • Biographies (
  • Clipart (
  • Geography (
    • Countries of the World (
    • Maps (
    • Flags (
    • Continents (
  • History (
    • Ancient Civilizations (
    • Industrial Revolution (
    • Middle Ages (
    • Prehistory (
    • Renaissance (
    • Timelines (
    • United States (
    • Wars (
    • World History (
  • Human Body (
  • Mathematics (
  • Reference (
  • Science (
    • Animals (
    • Aviation (
    • Dinosaurs (
    • Earth (
    • Inventions (
    • Physical Science (
    • Plants (
    • Scientists (
  • Social Studies (
    • Anthropology (
    • Economics (
    • Government (
    • Religion (
    • Holidays (
  • Space and Astronomy
    • Solar System (
    • Planets (
  • Sports (
  • Timelines (
  • Weather (
  • US States (


  • Home Page (
  • Contact Us (

  • Clip Art (
Personal tools