Virtually Contiguous VS. Physically Contiguous Memory

Are Arrays Contiguous? (Virtual vs Physical)

Unless the start of the array happens to be aligned to the beginning of a memory page, it can still occupy two pages; it can start near the end of one page and end on the next page. Arrays allocated on the stack will probably not be forced to occupy a single page, because stack frames are simply allocated sequentially in the stack memory, and the array will usually be at the same offset within each stack frame.

The heap memory allocator (malloc()) could try to ensure that arrays that are smaller than a page will be allocated entirely on the same page, but I'm not sure if this is actually how most allocators are implemented. Doing this might increase memory fragmentation.

Why physically contiguous memory region is more efficient than virtually contiguous memory.?

For large blocks of physically contiguous memory, the kernel can use huge pages, i.e., much fewer page table entries.

Are Arrays Contiguous in *Physical* Memory?

Each page of virtual memory is mapped identically to a page of physical memory; there is no remapping for units of smaller than a page. This is inherent in the principle of paging. Assuming 4KB pages, the top 20 or 52 bits of a 32- or 64-bit address are looked up in the page tables to identify a physical page, and the low 12 bits are used as an offset into that physical page. So if you have two addresses within the same page of virtual memory (i.e. the virtual addresses differ only in their 12 low bits), then they will be located at the same relative offsets in some single page of physical memory. (Assuming the virtual page is backed by physical memory at all; it could of course be swapped out at any instant.)

For different virtual pages, there is no guarantee at all about how they are mapped to physical memory. They could easily be mapped to entirely different locations of physical memory (or of course one or both could be swapped out).

So if you allocate a very large array in virtual memory, there is no need for a sufficiently large contiguous block of physical memory to be available; the OS can simply map those pages of virtual memory to any arbitrary pages in physical memory. (Or more likely, it will initially leave the pages unmapped, then allocate physical memory for them in smaller chunks as you touch the pages and trigger page faults.)

This applies to all parts of a process's virtual memory: static code and data, stack, memory dynamically allocated with malloc/sbrk/mmap etc.

Linux does have support for huge pages, in which case the same logic applies but the pages are larger (a few MB or GB; the available sizes are fixed by hardware).

Other than very specialized applications like hardware DMA, there isn't normally any reason for an application programmer to care about how physical memory is arranged behind the scenes.

Is stack memory contiguous physically in Linux?

As far as I can see, stack memory is contiguous in virtual memory
address, but stack memory is also contiguous physically? And does this
have something to do with the stack size limit?

No, stack memory is not necessarily contiguous in the physical address space. It's not related to the stack size limit. It's related to how the OS manages memory. The OS only allocates a physical page when the corresponding virtual page is accessed for the first time (or for the first time since it got paged out to the disk). This is called demand-paging, and it helps conserve memory usage.

why do we think that stack memory is always quicker
than heap memory? If it's not physically contiguous, how can stack
take more advantage of cache?

It has nothing to do with the cache. It's just faster to allocate and deallocate memory from the stack than the heap. That's because allocating and deallocating from the stack takes only a single instruction (incrementing or decrementing the stack pointer). On the other hand, there is a lot more work involved into allocating and/or deallocating memory from the heap. See this article for more information.

Now once memory allocated (from the heap or stack), the time it takes to access that allocated memory region does not depend on whether it's stack or heap memory. It depends on the memory access behavior and whether it's friendly to the cache and memory architecture.

if we want to sort a large amount of numbers, using array to store the
numbers is better than using a list, because every list node may be
constructed by malloc, so it may not take good advantage of cache,
that's why I say stack memory is quicker than heap memory.

Using an array is faster not because arrays are allocated from the stack. Arrays can be allocated from any memory (stack, heap, or anywhere). It's faster because arrays are usually accessed contiguously one element at a time. When the first element is accessed, a whole cache line that contains the element and other elements is fetched from memory to the L1 cache. So accessing the other elements in that cache line can be done very efficiently, but accessing the first element in the cache line is still slow (unless the cache line was prefetched). This is the key part: since cache lines are 64-byte aligned and both virtual and physical pages are 64-byte aligned as well, then it's guaranteed that any cache line fully resides within a single virtual page and a single physical page. This what makes fetching cache lines efficient. Again, all of this has nothing to do with whether the array was allocated from the stack or heap. It holds true either way.

On the other hand, since the elements of a linked list are typically not contiguous (not even in the virtual address space), then a cache line that contains an element may not contain any other elements. So fetching every single element can be more expensive.

How does making the virtual address contiguous in physical address zone improve performance?

map the physically contiguous zone into contiguous virtual address. How does this improve the performance?

DPDK needs both physical and virtual addresses. The virtual address is used normally, to load/store some data. The physical address is necessary for the userspace drivers to transfer data to/from devices.

For example, we allocate a few mbufs with virtual addresses 0x41000, 0x42000 and 0x43000. Then we fill them with some data and pass those virtual addresses to the PMD to transfer.

The driver has to convert those virtual addresses to physical. If physical pages are mapped to virtual address space noncontiguous, to convert virtual to physical addresses we need to search through all the mappings. For example, virtual address 0x41000 might correspond to physical 0x81000, 0x42000 corresponds to 0x16000, and 0x43000 — to 0x64000.

The best case of such a search is one memory read, the worst case — a few memory reads for each buffer.

But if we are sure that both virtual and physical addresses of a memory zone are contiguous, we simply add an offset to the virtual address to get the physical and vice versa. For example, virtual 0x41000 corresponds to 0x81000, virtual 0x42000 to physical 0x82000, and 0x430000x83000.

The offset we know from the mapping. The worst case of such a translation is one memory read per all the buffers in a burst, which is a huge improvement for the translation.

Why is this remapping necessary?

To map a huge page to a virtual address space an mmap system call is used. The API of the call allows to specify the fixed virtual address for the huge page to be mapped. This allows to map huge pages one after another creating a contiguous virtual memory zone. For example, we can mmap a huge page at the virtual address 0x200000, the next one at the virtual address 0x400000 and so on.

Unfortunately, we don't know physical addresses of the huge pages until they are mapped. So at the virtual address 0x200000 we might map the physical address 0x800000, and at the virtual address 0x400000 — the physical 0x600000.

But once we mapped those huge pages for the first time, we know both physical and virtual addresses. So all we need to do is to remap them in the correct order: at virtual address 0x1200000 we map physical 0x600000, and at 0x1400000 — physical 0x800000.

Now we have a virtually and physically contiguous memory zone starting at the virtual address 0x1200000 and physical address 0x600000. So to convert virtual to physical addresses in this memory zone we just need to subtract the offset 0x600000 from the virtual address as described previously.

Hope this clarifies a bit the idea of contiguous memory zones and remapping.

malloc does not guarantee returning physically contiguous memory

malloc does not guarantee returning physically contiguous memory

yes

It guarantees returning virtually contiguous memory

yes

Especially it is true when size > 4KB because 4KB is a size of page.
( On Linux systems).

Being contiguous memory does not imply that it will also be page aligned. The allcated memory can start from any address in heap. So whatever OS uses the page size it does not affect the allocation nature of malloc.

Are std::vector elements contiguous in physical memory?

The memory used to store the data in a vector must be at contiguous addresses as those addresses are visible to the code.

In a typical case on most modern CPUs/OSes, that will mean the virtual addresses must be contiguous. If those virtual addresses cross a page boundary, then there's a good chance that the physical addresses will no longer be contiguous.

I should add that this is only rarely a major concern. Modern systems have at least some support for such fragmented memory usage right down to the hardware level in many cases. For example, many network and disk controllers include "scatter/gather" capability, where the OS uses the page tables to translate the virtual addresses for the buffer to physical addresses, then supplies a number of physical addresses directly to the controller, which then gathers the data from those addresses if it's transferring from memory to peripheral or "scatters" the data out to those addresses if it's transferring from peripheral to memory.



Related Topics



Leave a reply



Submit