Exploiting a String-Based Overflow on X86-64 with Nx (Dep) and Aslr Enabled

DEP and ASLR and how to use it?

This is simply a matter of using the right linker option so it flips a bit in the executable header. The Microsoft linker options are /NXCOMPAT (DEP) and /DYNAMICBASE (ASLR). I don't know your tools well enough to know if they have similar options. Editbin.exe supports these options too, you can always run it in a post-build event.

Exploiting strcpy() in C with Buffer overflow,

any overflow ( buffer, stack, heap, ... ) requires shell code to lead to an exploit.

ASLR and DEP randomize the location of specific modules ( like i.e. stack, heap, libc ) in memory by a random offset cf https://security.stackexchange.com/questions/18556/how-do-aslr-and-dep-work

on linux you can see how ASLR works with cat /proc/self/maps ( With ASLR turned on, are all sections of an image get loaded at the same offsets relative to the image base address every time? )

if this would not be done and the modules were at static positions in memory ( like it was back in the old days ) one would have a static address where specific functions are located and these addresses could be used as entry point for the shellcode execution, because any overflow exploit has the goal to place shellcode in memory and execute this shellcode by a pointer to the specific position in memory

i will not tell you more about grey techniques here but maybe have a look at return-oriented programming what is a variant of overflow technique that is still efficient

( Exploiting a string-based overflow on x86-64 with NX (DEP) and ASLR enabled )

ASLR and memory layout on 64 bits: Is it limited to the canonical part (128 TiB)?

Obviously Linux won't give your process unusable addresses, that would make it raise a #GP(0) exception (and thus segfault) when it tries to execute code from _start. (Or if close to the cutoff, when it tries to load or store .data or .bss)

That would actually happen on the instruction that tried to set RIP to a non-canonical value in the first place, likely an iret or sysret¹.

On systems with 48-bit virtual addresses, zero to 0000_7fff_ffff_ffff is the full lower half of virtual address space when represented as a sign-extended 64-bit value.

On systems with PML5 supported (and used by the kernel), virtual addresses are 57 bits wide, so

zero to 00ff_ffff_ffff_ffff is the low-half canonical range.

See https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt - the first row is the user-space range. (It talks about "56 bit" virtual addresses. That's incorrect or misleading, PML5 is 57-bit, an extra full level of page tables with 9 bits per level. So the low half is 56 bits with a 0 in the 57th and the high half is 56 bits with a 1 in the 57th.)

========================================================================================================================
    Start addr    |   Offset   |     End addr     |  Size   | VM area description
========================================================================================================================
                  |            |                  |         |
 0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -128 TB
                  |            |                  |         |     starting offset of kernel mappings.
__________________|____________|__________________|_________|___________________________________________________________
                                                            |
                                                            | Kernel-space virtual memory, shared between all processes:
...

Or for PML5:

 0000000000000000 |    0       | 00ffffffffffffff |   64 PB | user-space virtual memory, different per mm
__________________|____________|__________________|_________|___________________________________________________________
                  |            |                  |         |
 0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
                  |            |                  |         |     virtual memory addresses up to the -64 PB
                  |            |                  |         |     starting offset of kernel mappings.

Footnote 1:

As prl points out, this design allows an implementation to literally only have 48 actual bits to store RIP values anywhere in the pipeline, except jumps and detecting signed overflow in case execution runs off the end into non-canonical territory. (Maybe saving transistors in every place that has to store a uop, which needs to know its own address.) Unlike if you could jump / iret to an arbitrary RIP, and then the #GP(0) exception would have to push the correct 64-bit non-canonical address, which would mean the CPU would have to remember it temporarily.

It's also more useful for debugging to see where you jumped from, so it makes sense to design the rule this way because there's no use-case for jumping to a non-canonical address on purpose. (Unlike jumping to an unmapped page, where the #PF exception handler can repair the situation, e.g. by demand paging, so for that you want the fault address to be new RIP.)

Fun fact: using sysret with a non-canonical RIP on Intel CPUs will #GP(0) in ring 0 (CPL=0), so RSP isn't switched and is still = user stack. If any other threads existed, this would let them mess with memory the kernel was using as a stack. This is a design flaw in IA-32e, Intel's implementation of x86-64. That's why Linux uses iret to return to user space from the syscall entry point if ptrace has been used on this process during that time. The kernel knows a fresh process will have a safe RIP so it might actually use sysret to jump to user-space faster.

How to find the address of a not imported libc function when ASLR is on?

Answering to my own question. Measuring the distance between functions in your
l̲o̲c̲a̲l̲ libc does not guarantee that the r̲e̲m̲o̲t̲e̲ libc will have the same alignment.
You have to find the libc version somehow, then you can get the address difference like so:

readelf -s /lib32/libc-2.19.so | grep printf

Possible ways to find the libc version if you know two addresses:

Libc binary collection
libcdb.com
pwnlib
... or you have access to the shell on the remote machine and can peek into the library with readelf yourself

Exploiting a String-Based Overflow on X86-64 with Nx (Dep) and Aslr Enabled

DEP and ASLR and how to use it?

Exploiting strcpy() in C with Buffer overflow,

ASLR and memory layout on 64 bits: Is it limited to the canonical part (128 TiB)?

How to find the address of a not imported libc function when ASLR is on?

Related Topics

Leave a reply