Linux: How to Check The Largest Contiguous Address Range Available to a Process

Linux: how to check the largest contiguous address range available to a process

Slighly nicer version of my above comment:

#!perl -T

use warnings;
use strict;

scalar(@ARGV) > 0 or die "Use: $0 <pid>";

my $pid = $ARGV[0];
$pid = oct($pid) if $pid=~/^0/;         # support hex and octal PIDs
$pid += 0; $pid = abs(int($pid));       # make sure we have a number

open(my $maps, "<", "/proc/".$pid."/maps") or
        die "can't open maps file for pid ".$pid;

my $max = 0;
my $end = 0;
while (<$maps>) {
        /([0-9a-f]+)-([0-9a-f]+)/;
        $max = hex ($1) - $end if $max < hex ($1) - $end;
        $end = hex ($2);
}

close ($maps);

END {
        print "$max\n";
}

linux command to get the largest contiguous physical address space available

I don't know such command and don't think that such command exists.

If your needs are just testing, you may limit amount of RAM available to kernel by kernel mem= boot parameter. Rest of memory would be physical, continuous and available exclusively to you.

How do I find the range(s) of MMAPable virtual addresses in a program?

After a lot of testing I've found the solution I proposed in the question does work. I've been using cat /proc/<pid>/maps to check my custom allocator, and it's behaving as I expected. To reiterate the solution:

To find the lower bound use sbrk(0), make sure the ptr is page aligned, and then ensure that brk and sbrk are never called again.
To safely approximate the upper bound find the stack size with getrlimit, subtract that from a ptr into the stack, page align the ptr, and then never change the stack size with setrlimit.

If you might need to touch brk, sbrk, or setrlimit, then you can also add some padding to the lower bound and subtract some padding from the upper bound. You can dynamically compute a safe amount of padding by finding how much memory the system has with /proc/meminfo, or if you don't need a general solution you can just over-approximate how much you'll need based on what you're doing.

Is contiguous memory easier to get in a 64-bit address space? If so why?

That's very true. A process must allocate memory from the virtual memory address space. Which stores both code and data and whose size is restricted by the addressing capability of the architecture. You can never address more than 2^32 bytes in a 32-bit process, not counting bank-switching tricks. That's 4 gigabytes. The operating system typically takes a big chunk out of that as well, on 32-bit Windows for example that cuts down the addressable VM size to 2 gigabytes.

Ideally, allocations are made so that they fit snugly together. That very rarely works out in practice. Shared libraries or DLLs in particular need to pick a preferred load address and that has to be guessed up front when the library is built.

So in practice, the allocations are made from the holes in between existing ones and the largest possible contiguous allocation you can get is restricted by the size of the largest hole. Usually much smaller than the addressable VM size, on Windows it is typically around 650 megabytes. That tends to go down-hill from there as the available address space is getting fragmented by allocations. Particularly by native code that can't afford to have allocations moved by a compacting garbage collector. If you use Windows then you can get insight in the VM allocations with the SysInternals' VMMap utility.

This problem completely disappears in a 64-bit process. The theoretical addressable virtual memory size is 2^64, an enormous number. So large that current processors don't implement it, they can go up to 2^48. Further restricted by the operating system version you have and its willingness to keep page mapping tables for that much VM. Eight terabytes is a typical limit. By implication, the holes between allocations are huge. Your program will keel over on paging file thrashing before it dies from OOM.

Is stack memory contiguous physically in Linux?

As far as I can see, stack memory is contiguous in virtual memory
address, but stack memory is also contiguous physically? And does this
have something to do with the stack size limit?

No, stack memory is not necessarily contiguous in the physical address space. It's not related to the stack size limit. It's related to how the OS manages memory. The OS only allocates a physical page when the corresponding virtual page is accessed for the first time (or for the first time since it got paged out to the disk). This is called demand-paging, and it helps conserve memory usage.

why do we think that stack memory is always quicker
than heap memory? If it's not physically contiguous, how can stack
take more advantage of cache?

It has nothing to do with the cache. It's just faster to allocate and deallocate memory from the stack than the heap. That's because allocating and deallocating from the stack takes only a single instruction (incrementing or decrementing the stack pointer). On the other hand, there is a lot more work involved into allocating and/or deallocating memory from the heap. See this article for more information.

Now once memory allocated (from the heap or stack), the time it takes to access that allocated memory region does not depend on whether it's stack or heap memory. It depends on the memory access behavior and whether it's friendly to the cache and memory architecture.

if we want to sort a large amount of numbers, using array to store the
numbers is better than using a list, because every list node may be
constructed by malloc, so it may not take good advantage of cache,
that's why I say stack memory is quicker than heap memory.

Using an array is faster not because arrays are allocated from the stack. Arrays can be allocated from any memory (stack, heap, or anywhere). It's faster because arrays are usually accessed contiguously one element at a time. When the first element is accessed, a whole cache line that contains the element and other elements is fetched from memory to the L1 cache. So accessing the other elements in that cache line can be done very efficiently, but accessing the first element in the cache line is still slow (unless the cache line was prefetched). This is the key part: since cache lines are 64-byte aligned and both virtual and physical pages are 64-byte aligned as well, then it's guaranteed that any cache line fully resides within a single virtual page and a single physical page. This what makes fetching cache lines efficient. Again, all of this has nothing to do with whether the array was allocated from the stack or heap. It holds true either way.

On the other hand, since the elements of a linked list are typically not contiguous (not even in the virtual address space), then a cache line that contains an element may not contain any other elements. So fetching every single element can be more expensive.

why cannot access to contiguous memory addresses in physical memory

At first when a process is loaded into memory, the OS can optimize to load process pages contiguously to physical memory.The process pages in memory cant always be contiguous due to swapping in and out, because there are other processes and things in memory that occupy space,so if later when some process pages becomes less used it is swapped back to hard drive, and when it is needed again it is not guaranteed to be loaded to the same spot before swapping out because there can be another process page laying there. You should read about virtual memory to gain good understanding of all of this.

Linux: How to Check The Largest Contiguous Address Range Available to a Process