/Proc/$Pid/Maps Shows Pages with No Rwx Permissions on X86_64 Linux

Linux default behavior of executable .data section changed between 5.4 and 5.9?

This is only a guess: I think the culprit is the READ_IMPLIES_EXEC personality that was being set automatically in the absence of a PT_GNU_STACK segment.

In the 5.4 kernel source we can find this piece of code:

SET_PERSONALITY2(loc->elf_ex, &arch_state);
if (elf_read_implies_exec(loc->elf_ex, executable_stack))
    current->personality |= READ_IMPLIES_EXEC;

That's the only thing that can transform an RW section into an RWX one. Any other use of PROC_EXEC didn't seem to be changed or relevant to this question, to me.

The executable_stack is set here:

for (i = 0; i < loc->elf_ex.e_phnum; i++, elf_ppnt++)
    switch (elf_ppnt->p_type) {
    case PT_GNU_STACK:
        if (elf_ppnt->p_flags & PF_X)
            executable_stack = EXSTACK_ENABLE_X;
        else
            executable_stack = EXSTACK_DISABLE_X;
        break;

But if the PT_GNU_STACK segment is not present, that variable retains its default value:

int executable_stack = EXSTACK_DEFAULT;

Now this workflow is identical in both 5.4 and the latest kernel source, what changed is the definition of elf_read_implies_exec:

Linux 5.4:

/*
 * An executable for which elf_read_implies_exec() returns TRUE will
 * have the READ_IMPLIES_EXEC personality flag set automatically.
 */
#define elf_read_implies_exec(ex, executable_stack) \
    (executable_stack != EXSTACK_DISABLE_X)

Latest Linux:

/*
 * An executable for which elf_read_implies_exec() returns TRUE will
 * have the READ_IMPLIES_EXEC personality flag set automatically.
 *
 * The decision process for determining the results are:
 *
 *                 CPU: | lacks NX*  | has NX, ia32     | has NX, x86_64 |
 * ELF:                 |            |                  |                |
 * ---------------------|------------|------------------|----------------|
 * missing PT_GNU_STACK | exec-all   | exec-all         | exec-none      |
 * PT_GNU_STACK == RWX  | exec-stack | exec-stack       | exec-stack     |
 * PT_GNU_STACK == RW   | exec-none  | exec-none        | exec-none      |
 *
 *  exec-all  : all PROT_READ user mappings are executable, except when
 *              backed by files on a noexec-filesystem.
 *  exec-none : only PROT_EXEC user mappings are executable.
 *  exec-stack: only the stack and PROT_EXEC user mappings are executable.
 *
 *  *this column has no architectural effect: NX markings are ignored by
 *   hardware, but may have behavioral effects when "wants X" collides with
 *   "cannot be X" constraints in memory permission flags, as in
 *   https://lkml.kernel.org/r/20190418055759.GA3155@mellanox.com
 *
 */
#define elf_read_implies_exec(ex, executable_stack) \
    (mmap_is_ia32() && executable_stack == EXSTACK_DEFAULT)

Note how in the 5.4 version the elf_read_implies_exec returned a true value if the stack was not explicitly marked as not executable (via the PT_GNU_STACK segment).

In the latest source, the check is now more defensive: the elf_read_implies_exec is true only on 32-bit executable, in the case where no PT_GNU_STACK segment was found in the ELF binary.

I assembled your program, linked it, and found no PT_GNU_STACK segment, so this may be the reason.

If this is indeed the issue and if I followed the code correctly, if you set the stack as not executable in the binary, its data section should not be mapped executable anymore (not even on Linux 5.4).

understanding pmap output

First of all , there can be this case that one same process can use more than one memory usage instance. I don't know if this is what you want to know. I have seen that , while using a browser in Linux, with just one tab open, and using the top command, it shows like more than 4 usage in the memory usage list, covering more than 10mb of memory. I think its ok because of the more number of threads running by the same process.

This link may be useful, since, in the usage example itself, if you observe, the mapping of the -x command show more number of usage.

http://www.cyberciti.biz/tips/howto-find-memory-used-by-program.html

Why does Linux favor 0x7f mappings?

First and foremost, assuming that you are talking about x86-64, we can see that the virtual memory map for x86-64 is:

========================================================================================================================
    Start addr    |   Offset   |     End addr     |  Size   | VM area description
========================================================================================================================
                  |            |                  |         |
 0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
__________________|____________|__________________|_________|___________________________________________________________
 ...              |    ...     | ...              |  ...

Userspace addresses are always in the canonical form in x86-64, using only the lower 48 bits with 4-level page tables or 57 bits with 5-level page tables (note that the highest bit is sign extended and only set to 1 for the kernel, therefore in reality you only see at most 47 or 56 bits set in userspace with the most significant always set to 0).

See:

x86-64 canonical address?
Address canonical form and pointer arithmetic

This puts the end of user-space virtual memory at 0x7fffffffffff. This is where the stack of new programs starts: that is, 0x7ffffffff000 (minus some random offset due to ASLR) and growing to lower addresses.

Let me address the simple question first:

Will there be a problem if I manually mmap pages outside of these prefixes?

Not at all, the mmap syscall always checks the address that is being requested, and it will refuse to map pages that overlap an already mapped memory area or pages at completely invalid addresses (e.g. addr < mmap_min_addr or addr > 0x7ffffffff000).

Now... diving straight into the Linux kernel code, precisely in the kernel ELF loader (fs/binfmt_elf.c:960), we can see a pretty long and esplicative comment:

/*
 * This logic is run once for the first LOAD Program
 * Header for ET_DYN binaries to calculate the
 * randomization (load_bias) for all the LOAD
 * Program Headers, and to calculate the entire
 * size of the ELF mapping (total_size). (Note that
 * load_addr_set is set to true later once the
 * initial mapping is performed.)
 *
 * There are effectively two types of ET_DYN
 * binaries: programs (i.e. PIE: ET_DYN with INTERP)
 * and loaders (ET_DYN without INTERP, since they
 * _are_ the ELF interpreter). The loaders must
 * be loaded away from programs since the program
 * may otherwise collide with the loader (especially
 * for ET_EXEC which does not have a randomized
 * position). For example to handle invocations of
 * "./ld.so someprog" to test out a new version of
 * the loader, the subsequent program that the
 * loader loads must avoid the loader itself, so
 * they cannot share the same load range. Sufficient
 * room for the brk must be allocated with the
 * loader as well, since brk must be available with
 * the loader.
 *
 * Therefore, programs are loaded offset from
 * ELF_ET_DYN_BASE and loaders are loaded into the
 * independently randomized mmap region (0 load_bias
 * without MAP_FIXED).
 */
if (interpreter) {
    load_bias = ELF_ET_DYN_BASE;
    if (current->flags & PF_RANDOMIZE)
        load_bias += arch_mmap_rnd();
    elf_flags |= MAP_FIXED;
} else
    load_bias = 0;

In short, there are two types of ELF Position Independent Executables:

Normal programs: they require a loader in order to run. This represents basically 99.9% of the ELF programs on a normal Linux system. The path of the loader is specified in the ELF program headers, with a program header of type PT_INTERP.
Loaders: a loader is an ELF that does not specify a PT_INTERP program header, and that is responsible for loading and starting normal programs. It also does a bunch of fancy stuff behind the scenes (resolve relocations, load needed libraries, etc.) before actually starting the program that is being loaded.

When the kernel executes a new ELF through an execve syscall, it needs to map into memory the program itself and the loader. Control will then be passed to the loader that will resolve and map all needed shared libraries and finally pass control to the program. Since both the program and its loader need to be mapped, the kernel needs to make sure that those mappings don't overlap (and also that future mapping requests by the loader will not overlap).

In order to do this, the loader is mapped near the stack, (at a lower address than the stack, but with some tolerance, since the stack is allowed to grow by adding more pages if needed), leaving the duty of applying ASLR to mmap itself. The program is then mapped using a load_bias (as seen in the above snippet) to put it far enough from the loader (at a much lower address).

If we take a look at ELF_ET_DYN_BASE, we see that it is architecture dependent and on x86-64 it evaluates to:

((1ULL << 47) - (1 << 12)) / 3 * 2 == 0x555555554aaa

Basically around 2/3 of TASK_SIZE. That load_bias is then adjusted adding arch_mmap_rnd() bytes if ASLR is enabled, and finally page-aligned. At the end of the day, this is the reason why we usually see addresses starting with 0x55 for programs.

When control is passed to the loader, the virtual memory area for the process has already been defined, and successive mmap syscalls that do not specify an address will return decreasing addresses starting near the loader. As we just saw the loader is mapped near the stack, and the stack is at the very end of the user address space: this is the reason why we usually see addresses starting with 0x7f for libraries.

There is a common exception to the above. In the case the loader is invoked directly, like for example:

/lib/x86_64-linux-gnu/ld-2.24.so ./myprog

The kernel will not map ./mpyprog in this case and will leave that to the loader. As a consequence, ./myprog will be mapped at some 0x7f... address by the loader.

You may be wondering: why doesn't the kernel always let the loader map the program then, or why isn't the program just mapped right before/after the loader? I don't have a 100% definitive answer for this, but a few reasons come to mind:

Consistency: making the kernel itself load the ELF into memory without depending on the loader avoids trouble. If this wasn't the case, the kernel would fully depend on the userspace loader, which is not advisable at all (this may also partially be a security concern).
Efficiency: we are sure that at least both the executable and its loader need to be mapped (regardless of any linked libraries), might as well save precious time and do it right away rather than wait for another syscall with associated context switch.
Security: in the default scenario, mapping the program at a different randomized address than the loader and other libraries provides a sort of "isolation" between the program itself and the loaded libraries. In other words, "leaking" any library address won't reveal the program position in memory, and vice-versa. Mapping the program at a predefined offset from the loader and other libraries would instead partially defeat the purpose of ASLR.
In an ideal security-driven scenario, every single mmap (i.e. any needed library) would also be placed at a randomized address independent of previous mappings, but this would hurt performance significantly. Keeping allocations grouped results in faster page table lookups: see Understanding The Linux Kernel (3rd edition), page 606: Table 15-3. Highest index and maximum file size for each radix tree height. It would also cause much greater virtual memory fragmentation, becoming a real problem for programs that need to map large files to memory. The substantial part of isolation between program code and library code is already done, going further has more cons than pros.
Ease of debugging: seeing RIP=0x55... vs RIP=0x7f... instantly helps figuring out where to look (program itself or library code).

Can you change the page permissions of a running process?

Yes, you can, given the appropriate privileges. Depending on the situation, you may need CAP_SYS_PTRACE capabilities. See this documentation page about /proc/sys/kernel/yama/ptrace_scope to know more.

Easy-peasy solution using GDB

The simplest way is to use a debugger, for example GDB:

Run the target process and get its PID. You can find the PID of the process with ps, htop, pidof and similar commands.
Open a terminal and attach GDB to the process with gdb --pid PID.

At the GDB prompt, check the memory mappings:

(gdb) info inferiors
  Num  Description       Connection           Executable
* 1    process 19433     2 (native)           /usr/bin/ls
(gdb) !cat /proc/19433/maps
555555554000-555555558000 r--p 00000000 00:18 5154601                    /usr/bin/ls
555555558000-55555556d000 r-xp 00004000 00:18 5154601                    /usr/bin/ls
55555556d000-555555576000 r--p 00019000 00:18 5154601                    /usr/bin/ls
555555577000-555555579000 rw-p 00022000 00:18 5154601                    /usr/bin/ls
555555579000-55555557a000 rw-p 00000000 00:00 0                          [heap]
7ffff7fcc000-7ffff7fd0000 r--p 00000000 00:00 0                          [vvar]
...

Note: I use !cat /proc/PID/maps instead of info proc mappings here because the latter does not show the permissions unfortunately.

Call the mprotect() libc function to change the permissions of the pages you want. For example here I change a single page to RWX:
```
(gdb) call (long)mprotect(0x555555577000, 0x1000, 0x7)
$1 = 0  <-- return value, 0 == success
```
Now detach and let the process run with the modified permissions.

You can also write a GDB script to do this for you automatically. For example, if you know that at some point RDI will contain an address in the page you want to touch, you could do something like this:

file path/to/your/elf
# or alternatively `attach PID`

# Set a breakpoint to some known address (assuming your ELF is not position independent).
# You could also do `break some_symbol+offset` if there are symbols.

break *0x123450
command 1
    set $page = $rdi & ~0xfff
    call (long)mprotect($page, 0x1000, 0x7)
    detach
end

run

Then in your terminal:

$ gdb -x yourscript.txt

Advanced manual solution using `ptrace`

Alternatively, you can leverage the same low level tool that GDB uses under the hood to write a program that does this for you automatically: the ptrace syscall. You can write a program which does the following:

Spawn the target process through fork + execve (or just run it in another terminal).
PTRACE_ATTACH to it.
Peek around with PTRACE_GETREGS (save the initial register state for later), PTRACE_PEEKDATA (inspect memory), etc.
Manipulate the tracee's memory to write a simple stub that calls mprotect for you (PTRACE_POKEDATA).
Force the tracee to execute that little stub.
Reset everything back to normal (restore regs, possibly also remove the stub, etc).
Detach from the tracee and let it continue executing.

This is of course a lot harder than the debugger based approach, but it can be fun to experiment with. You are basically writing your own very specialized debugger.

See this answer of mine on "Writing the simplest assembly debugger" for more information and a code example of what it could look like.

Here's also a more complex example on GitHub: in this case the programmer wants to close an arbitrary file descriptor of an already running process through ptrace.

Can i execute code that resides in data segment (ELF binary)?

i receive a segmentation fault.

It is hardware control of data execution prevention (https://en.wikipedia.org/wiki/Executable_space_protection#Linux) - you can't just jump to data page if it has no 'x' (execute) bit set in page tables. Memory mappings with all bits are listed in /proc/$pid/maps / /proc/$pid/smaps files as 'rwx' for writable code, 'rw-' for data without execution, 'r--' for readonly data, 'r-x' for normal code.

If you want to execute data, you should call mprotect syscall with PROT_EXEC flag on the section of your data which wants to be code.

In x86 world this was fully implemented as "NX bit" / "XD bit" feature in Pentium 4 (Prescott) and newer (Core, Core2, Core i*, core m) / in Athlon 64 / Opteron and newer. If OS works in 32-bit mode, it must turn on PAE to have this bit in page table. In x86_64 mode (64-bit) there is always NX/XD bit supported.

First variants of support were added to linux around 2004: http://linuxgazette.net/107/pramode.html

In 2007 you may have outdated hardware, old kernel or 32-bit mode kernel without PAE.

Info about NX/XD bits: https://en.wikipedia.org/wiki/NX_bit

Sometimes 'rwx' mode may be prohibited, check https://en.wikipedia.org/wiki/W^X.

For pre-NX systems there were solutions based on segment registers of x86 to partially disable part of memory space from executing.

can i execute the program above without having an segmentation fault ?

You can:

make the data page executable by calling mprotect on it with PROT_READ|PROT_EXEC
make the data segment of elf file marked as executable (need to hack deeply inside ld scripts - default is in ld --verbose)
make all pages including .data and the heap executable (not just the stack)

with ld or gcc -z execstack
move shellcode to text data of elf file
try to disable nx/xd bit in kernel (hard; recompilation may be needed)
use 32-bit OS (kernel) without PAE option enabled (build time option).
use older cpu without NX/XD

Why does the linux kernel map my RW segment as RWX?

The answer here is two-fold, one part being contributed by the dynamic linker, the other by the kernel. To see, this, let us look at the memory map right after entering the dynamic linker (e.g. by setting a breakpoint in _dl_start). We see:

003ff000-00400000 rwxp 00000000 00:28 1456774                            plt.out
00400000-00401000 r-xp 00001000 00:28 1456774                            plt.out
00600000-00602000 rwxp 00001000 00:28 1456774                            plt.out

which is at least closer to what we wanted (it has the correct segments, in the right places). Now, the reason the last segment gets split up is because of the GNU_RELRO program header, which says to the dynamic linker "Hey, I'm not gonna need to write to this anymore after you're done with your initial relocations", so the dynamic linker faithfully tries to set that region of memory to PROT_READ (note it ignores the actual permission flags set in the program header, though they appear to be conventionally set to PF_R).

That's only half the mystery though. We still have those pesky PROT_EXEC bits left that we didn't order. Those turn out to come down to a feature of the linux kernel called the READ_IMPLIES_EXEC personality, causing all maps with PROT_READ permission to also have PROT_EXEC permission (see the man page for personality(2)). It turns out that for compatibility reasons, linux automatically sets this personality unless a PT_GNU_STACK program header
tells it not to. The linker automatically creates this program header, if all input objects have an (empty) .note.GNU-stack section. See here for more information on that mechanism.

How to find the PID of a C code during compilation time?

Why not the getpid() method defined in unistd.h?

#include <unistd.h>
int main(){
    int pid = getpid();
    int parentsPID = getppid();
    return 0;
}

/Proc/$Pid/Maps Shows Pages with No Rwx Permissions on X86_64 Linux

Linux default behavior of executable .data section changed between 5.4 and 5.9?

understanding pmap output

Why does Linux favor 0x7f mappings?

Can you change the page permissions of a running process?

Easy-peasy solution using GDB

Advanced manual solution using `ptrace`

Can i execute code that resides in data segment (ELF binary)?

Why does the linux kernel map my RW segment as RWX?

How to find the PID of a C code during compilation time?

Related Topics

Leave a reply

Linux default behavior of executable .data section changed between 5.4 and 5.9?

understanding pmap output

Why does Linux favor 0x7f mappings?

Can you change the page permissions of a running process?

Easy-peasy solution using GDB

Advanced manual solution using ptrace

Can i execute code that resides in data segment (ELF binary)?

Why does the linux kernel map my RW segment as RWX?

How to find the PID of a C code during compilation time?

Related Topics

Leave a reply

Advanced manual solution using `ptrace`