How Is Numa Represented in Virtual Memory

NUMA: How to check in which part of RAM a C++ array is allocated?

Your question is answered here. I would only like to add some comments.

Note that calling new [] does not actually allocate physical memory. On modern operating systems this only results in an anonymous memory mapping begin made. Anonymous mappings do not correspond to files in the file system but rather are backed by the swap (if any). Initially the whole area points to a read-only page in the kernel that contains all zeros. Only when you actually write to the newly allocated memory, a new memory page is installed, which replaces the zero page for the page range where the accessed address falls. This is why we say that the zero page is copy-on-write (or CoW) mapped to the virtual address space of the process. The default policy is to try to allocate the new page on the same NUMA node, where the thread that accessed the memory area runs. This is called a "first-touch" NUMA policy. If there is not enough memory on that NUMA node, the page is allocated on some other node with enough free memory. It is also possible for small allocations to end up inside a bigger area (called arena), managed by the C library memory allocator malloc() (the C++ operator new [] calls malloc() in order to do the actual memory allocation). In this case the pages might be already present in the physical memory even before you write to the newly allocated memory.

Linux has the nasty habit of not preserving the NUMA association of memory areas when it swaps. That is, if a page was allocated on NUMA node 0, then swapped out and then swapped back in, there is no guarantee that the page won't be placed on NUMA node 1. This makes the question "where is my memory allocated" a bit tricky, since a consecutive swap-out followed by a swap-in could easily invalidate the result that you get from move_pages() just a fraction of a second ago. Therefore the question only makes sense in the following two special cases:

  • you explicitly lock the memory region: one could use the mlock(2) system call in order to tell the OS not to swap a particular range from the process virtual address space;
  • your system has no active swap areas: this prevents the OS altogether from moving the pages out from and back into the main memory.

How does the memory behind statically allocated huge pages get distributed across NUMA nodes?

From the kernel.org post about hugepages HERE

On a NUMA platform, the kernel will attempt to distribute the huge page pool
over all the set of allowed nodes specified by the NUMA memory policy of the
task that modifies nr_hugepages. The default for the allowed nodes--when the
task has default memory policy--is all on-line nodes with memory. Allowed
nodes with insufficient available, contiguous memory for a huge page will be
silently skipped when allocating persistent huge pages. See the discussion
below of the interaction of task memory policy, cpusets and per node attributes
with the allocation and freeing of persistent huge pages.

The success or failure of huge page allocation depends on the amount of
physically contiguous memory that is present in system at the time of the
allocation attempt. If the kernel is unable to allocate huge pages from
some nodes in a NUMA system, it will attempt to make up the difference by
allocating extra pages on other nodes with sufficient available contiguous
memory, if any.

System administrators may want to put this command in one of the local rc
init files. This will enable the kernel to allocate huge pages early in
the boot process when the possibility of getting physical contiguous pages
is still very high. Administrators can verify the number of huge pages
actually allocated by checking the sysctl or meminfo. To check the per node
distribution of huge pages in a NUMA system, use:

cat /sys/devices/system/node/node*/meminfo | fgrep Huge


Related Topics



Leave a reply



Submit