Understanding Linux /Proc/Pid/Maps or /Proc/Self/Maps

Understanding Linux /proc/pid/maps or /proc/self/maps

Each row in /proc/$PID/maps describes a region of contiguous virtual memory in a process or thread. Each row has the following fields:

address           perms offset  dev   inode   pathname
08048000-08056000 r-xp 00000000 03:0c 64593 /usr/sbin/gpm
  • address - This is the starting and ending address of the region in the process's address space
  • permissions - This describes how pages in the region can be accessed. There are four different permissions: read, write, execute, and shared. If read/write/execute are disabled, a - will appear instead of the r/w/x. If a region is not shared, it is private, so a p will appear instead of an s. If the process attempts to access memory in a way that is not permitted, a segmentation fault is generated. Permissions can be changed using the mprotect system call.
  • offset - If the region was mapped from a file (using mmap), this is the offset in the file where the mapping begins. If the memory was not mapped from a file, it's just 0.
  • device - If the region was mapped from a file, this is the major and minor device number (in hex) where the file lives.
  • inode - If the region was mapped from a file, this is the file number.
  • pathname - If the region was mapped from a file, this is the name of the file. This field is blank for anonymous mapped regions. There are also special regions with names like [heap], [stack], or [vdso]. [vdso] stands for virtual dynamic shared object. It's used by system calls to switch to kernel mode. Here's a good article about it: "What is linux-gate.so.1?"

You might notice a lot of anonymous regions. These are usually created by mmap but are not attached to any file. They are used for a lot of miscellaneous things like shared memory or buffers not allocated on the heap. For instance, I think the pthread library uses anonymous mapped regions as stacks for new threads.

Reading /proc/PID/maps of short-lived process

You solution cat /proc/$(<pipeline>)/maps won't call cat until the whole <pipeline> has ended (even if you use & within it), so you will never get your maps.

On the other hand,

<pipeline> & cat /proc/${!}/maps

will return immediately. Of course, as your binary exits immediately, cat may still run too late to capture /proc/${!}/maps

But you could try:

while true; do; ./binary [input] & cat /proc/${!}/maps ; done

This restarts a race between your binary and cat all day long, and sometimes cat may win (it does, in my case, with ls <nonexisting file> instead of ./binary 1 out of 30 times)

This prints a lot of garbage on your terminal, but you can collect the succesful maps by redirecting your cats output:

while true; do; ./binary [input] & cat /proc/${!}/maps >> mymaps; done

Elegant? No. Effective: I hope so!

What is a proc map?

The file /proc/[pid]/maps is a way to see the memory regions mapped by a process. Read about /proc for more info on other useful stuff you can find there.

How does the Linux kernel create the /proc/$pid/maps file?

You did something weird with the link.

Clicking through few definitions reveals the file is generated on demand here:
https://github.com/torvalds/linux/blob/bcf876870b95592b52519ed4aafcf9d95999bc9c/fs/proc/task_mmu.c#L271

(at least for the common mmu case)

the usual question: why are you asking?

Why I can see the several same segments in the /proc/pid/maps output?

Please mind the values in columns 3 (starting offset) and 2 (permissions). Really you have the same part mapped twice, in lines 1 and 2 for your binary file, but, in line 3, it's different. It's permitted to map the same file separately multiple times; different systems could skip merging this into one VM map entry, so it could reflect mapping history but not the current state jist.

If you see at library mappings you could easily find the law that any library is mapped separately:

  • With permission to read and execute: the main code which shouldn't be changed.
  • With permission to read: constant data area without code allowed.
  • With permission to read and write: it combines non-constant data area and relocation tables of shared objects.

Having the same starting 4K binary file area mapped twice could be explained with RTLD logic which differs from an arbitrary library logic due to bootstrapping needs. I don't treat it so important, more so it could easily differ on platform specifics.

Scattered maps found in /proc/PID/maps

Do you know what can cause this effect or in what conditions this can happen?

An executable can trivially mmap (parts of) itself. This could be done to e.g. examine its own symbol table (necessary to print crash stack trace), or to extract some embedded resource.

The maps for the executable (python3.9) appear first and the map for a shared library that is opened appear after the ones in the executable.

This is only true by accident, and only for non-PIE executables.

Non-PIE executables on x86_64 are traditionally linked to load at address 0x400000, and the shared libraries are normally loaded starting from below the main stack.

If you link a non-PIE executable to load at e.g. 0x7ff000000000, then it will likely appear in the /proc/$pid/maps after shared libraries.

Update:

the python binary here is certainly not mmapping itself, so that explanation doesn't apply

  1. You can't know that -- you almost certainly haven't read all the code in Python 3.9 and every module which you load.
  2. There is no need to guess where these mmaped regions are coming from, you can just look.

To look, run your program under GDB and use catch syscall mmap followed by where. This will allow you to see where each and every mapping came from.



Related Topics



Leave a reply



Submit