Aslr Bits of Entropy of Mmap()

ASLR and memory layout on 64 bits: Is it limited to the canonical part (128 TiB)?

Obviously Linux won't give your process unusable addresses, that would make it raise a #GP(0) exception (and thus segfault) when it tries to execute code from _start. (Or if close to the cutoff, when it tries to load or store .data or .bss)

That would actually happen on the instruction that tried to set RIP to a non-canonical value in the first place, likely an iret or sysret¹.

On systems with 48-bit virtual addresses, zero to 0000_7fff_ffff_ffff is the full lower half of virtual address space when represented as a sign-extended 64-bit value.

On systems with PML5 supported (and used by the kernel), virtual addresses are 57 bits wide, so

zero to 00ff_ffff_ffff_ffff is the low-half canonical range.

See https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt - the first row is the user-space range. (It talks about "56 bit" virtual addresses. That's incorrect or misleading, PML5 is 57-bit, an extra full level of page tables with 9 bits per level. So the low half is 56 bits with a 0 in the 57th and the high half is 56 bits with a 1 in the 57th.)

========================================================================================================================
    Start addr    |   Offset   |     End addr     |  Size   | VM area description
========================================================================================================================
                  |            |                  |         |
 0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -128 TB
                  |            |                  |         |     starting offset of kernel mappings.
__________________|____________|__________________|_________|___________________________________________________________
                                                            |
                                                            | Kernel-space virtual memory, shared between all processes:
...

Or for PML5:

 0000000000000000 |    0       | 00ffffffffffffff |   64 PB | user-space virtual memory, different per mm
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -64 PB
                  |            |                  |         |     starting offset of kernel mappings.

Footnote 1:

As prl points out, this design allows an implementation to literally only have 48 actual bits to store RIP values anywhere in the pipeline, except jumps and detecting signed overflow in case execution runs off the end into non-canonical territory. (Maybe saving transistors in every place that has to store a uop, which needs to know its own address.) Unlike if you could jump / iret to an arbitrary RIP, and then the #GP(0) exception would have to push the correct 64-bit non-canonical address, which would mean the CPU would have to remember it temporarily.

It's also more useful for debugging to see where you jumped from, so it makes sense to design the rule this way because there's no use-case for jumping to a non-canonical address on purpose. (Unlike jumping to an unmapped page, where the #PF exception handler can repair the situation, e.g. by demand paging, so for that you want the fault address to be new RIP.)

Fun fact: using sysret with a non-canonical RIP on Intel CPUs will #GP(0) in ring 0 (CPL=0), so RSP isn't switched and is still = user stack. If any other threads existed, this would let them mess with memory the kernel was using as a stack. This is a design flaw in IA-32e, Intel's implementation of x86-64. That's why Linux uses iret to return to user space from the syscall entry point if ptrace has been used on this process during that time. The kernel knows a fresh process will have a safe RIP so it might actually use sysret to jump to user-space faster.

Is mmap deterministic if ASLR is disabled?

If Address Space Layout Randomization (ASLR) is disabled, would we have a deterministic mmap?

If your application has exactly the same memory layout at moment of i-th mmap (in terms of which pages of virtual address space are mapped and which are not); then mmap should be deterministic in Linux kernel.

There are some strange situations possible, which can change memory layout. For example, additional command line arguments can shift stack to lower address. There are a lot of files, mmaped in c runtime (e.g. locales) and if some files have their size changed from previous start, the memory layout will be changed too. Even stack consumption may affect it.

If your application memory allocation (both sizes and order of allocations) via malloc changed, mmap will be not deterministic. So, if your application is threaded; it should fix order of malloc calls or limit all mallocs to main thread.

mm/mmap.c: arch_get_unmapped_area - default non-fixed mmap address resolver is deterministic IIF the VMA tree is the same AND history of previous mmap is same (there is a cache mm->free_area_cache which is live between calls to mmap.

ASLR implementation

The implementation can naturally be found in the Linux kernel source tree. Even just grepping for.. randomize_va_space will turn up enough results to start on.

In the elf loader #ifdef arch_randomize_brk -> arch_randomize_brk -> randomize_range

Aslr Bits of Entropy of Mmap()

ASLR and memory layout on 64 bits: Is it limited to the canonical part (128 TiB)?

Is mmap deterministic if ASLR is disabled?

ASLR implementation

Related Topics

Leave a reply