About the Memory Layout of Programs in Linux

About the memory layout of programs in Linux

I'm assuming you're building this with gcc -m32 -nostartfiles segment-bounds.S or similar, so you have a 32-bit dynamic binary. (You don't need -m32 if you're actually using a 32-bit system, but most people that want to test this will have 64-bit systems.)

My 64-bit Ubuntu 15.10 system gives slightly different numbers from your program for a few things, but the overall pattern of behaviour is the same. (Different kernel, or just ASLR, explains this. The brk address varies wildly, for example, with values like 0x9354001 or 0x82a8001)

1) Why is my program starting at address 0x8048190 instead of 0x8048000?

If you build a static binary, your _start will be at 0x8048000.

We can see from readelf -a a.out that 0x8048190 is the start of the .text section. But it isn't at the start of the text segment that's mapped to a page. (pages are 4096B, and Linux requires mappings to be aligned on 4096B boundaries of file position, so with the file laid out this way, it wouldn't be possible for execve to map _start to the start of a page. I think the Off column is position within the file.)

Presumably the other sections in the text segment before the .text section are read-only data that's needed by the dynamic linker, so it makes sense to have it mapped into memory in the same page.

## part of readelf -a output
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        08048114 000114 000013 00   A  0   0  1
  [ 2] .note.gnu.build-i NOTE            08048128 000128 000024 00   A  0   0  4
  [ 3] .gnu.hash         GNU_HASH        0804814c 00014c 000018 04   A  4   0  4
  [ 4] .dynsym           DYNSYM          08048164 000164 000020 10   A  5   1  4
  [ 5] .dynstr           STRTAB          08048184 000184 00001c 00   A  0   0  1
  [ 6] .gnu.version      VERSYM          080481a0 0001a0 000004 02   A  4   0  2
  [ 7] .gnu.version_r    VERNEED         080481a4 0001a4 000020 00   A  5   1  4
  [ 8] .rel.plt          REL             080481c4 0001c4 000008 08  AI  4   9  4
  [ 9] .plt              PROGBITS        080481d0 0001d0 000020 04  AX  0   0 16
  [10] .text             PROGBITS        080481f0 0001f0 0000ad 00  AX  0   0  1         ########## The .text section
  [11] .eh_frame         PROGBITS        080482a0 0002a0 000000 00   A  0   0  4
  [12] .dynamic          DYNAMIC         08049f60 000f60 0000a0 08  WA  5   0  4
  [13] .got.plt          PROGBITS        0804a000 001000 000010 04  WA  0   0  4
  [14] .data             PROGBITS        0804a010 001010 0000d4 00  WA  0   0  1
  [15] .bss              NOBITS          0804a0e8 0010e4 0002f4 00  WA  0   0  8
  [16] .shstrtab         STRTAB          00000000 0010e4 0000a2 00      0   0  1
  [17] .symtab           SYMTAB          00000000 001188 0002b0 10     18  38  4
  [18] .strtab           STRTAB          00000000 001438 000123 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

2) Why is there a gap between the end of the text section and the start of the data section?

Why not? They have to be in different segments of the executable, so mapped to different pages. (Text is read-only and executable, and can be MAP_SHARED. Data is read-write and has to be MAP_PRIVATE. BTW, in Linux the default is for data to also be executable.)

Leaving a gap makes room for the dynamic linker to map the text segment of shared libraries next to the text of the executable. It also means an out-of-bounds array index into the data section is more likely to segfault. (Earlier and noisier failure is always easier to debug).

3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?

That's interesting. They're in the bss, but IDK why the current position isn't affected by .lcomm labels. Probably they go in a different subsection before linking, since you used .lcomm instead of .comm. If I use use .skip or .zero to reserve space, I get the results you expected:

.section .bss
start_bss:
#.lcomm buffer, 500
#.lcomm buffer2, 250
buffer:  .skip 500
buffer2: .skip 250
end_bss:

.lcomm puts things in the BSS even if you don't switch to that section. i.e. it doesn't care what the current section is, and maybe doesn't care about or affect what the current position in the .bss section is. TL:DR: when you switch to the .bss manually, use .zero or .skip, not .comm or .lcomm.

4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?

That tells us that there are unmapped pages between the text segment and the brk. (Your loop starts with ebx = $start_text, so it faults at the on the first unmapped page after the text segment). Besides the hole in virtual address space between text and data, there's probably also other holes beyond the data segment.

Memory protection has page granularity (4096B), so the first address to fault will always be the first byte of a page.

C memory layout of a program

Let's compile your program with gcc -O0 program.c -o program then disassemble it with objdump -D program. For convenience, I went ahead and did this (AT&T syntax, Intel syntax). You can see that foo is in the .text section, and it's been replaced by a stub that essentially does nothing and just returns out of the function. Since point was defined after the declaration, it's equivalent to having been defined right at the declaration, and is also in the .text section with an actual implementation. You can see that you are correct, and that data1 is in .bss and that data2 is in .data. As for your {'a', 'b', 'c'} array in main, you can see it's a bit weird.

 6c1:   c6 45 f5 61             mov    BYTE PTR [rbp-0xb],0x61
 6c5:   c6 45 f6 62             mov    BYTE PTR [rbp-0xa],0x62
 6c9:   c6 45 f7 63             mov    BYTE PTR [rbp-0x9],0x63

The values are actually being loaded one by one into the array, so I guess you could say that it is being stored in the .text section. You may notice that the "word" string is not actually in the disassembly. However, if you do readelf -x .rodata program, you'll find it sitting in the .rodata section.

Hex dump of section '.rodata':
  0x000007d0 01000200 776f7264 00256c64 0a00     ....word.%ld..

You can also see that while the variables aren't referenced by name, they are located on the function's stack frame, given by an offset of the base pointer rbp. For 64 bit binaries, an address is 8 bytes, and for 32 bit binaries, an address is 4 bytes.

Program memory layout in linux

if I have an array allocated on the stack a pointer to the first element will also be lower > in value than a pointer to the second element ?

It is not important "how" you allocate the array, you can increase or decrease the stack pointer, but as result you have address space reserved for the array.

You can work with them in normal way, since the lowest adress is reserved for element 0.

so my question is what is the correct memory layout for a process in Linux ?

You can check it yourself. Insert somewhere into you program something
like std::cin.get() to pause your program.

Then run in a separate shell:

ps aux | grep your_program_name
cat /proc/<pid show by grep>/maps

This prints the memory mappings of your process, where you can see where the heap, the stack and other things are placed in memory.

About the stack: let's assume that you have ordinary machine with Linux and Intel or AMD 64 bit CPU. Then write the following code:

extern void f(int);

void g(int param)
{
    f(param);
}

compile it and disassemble:

g++ -ggdb -c test_my_stack.cc  && objdump -S test_my_stack.o

you can see (unimportant details removed):

 void g(int param)
 {
 0:   55                      push   %rbp
 1:   48 89 e5                mov    %rsp,%rbp
 4:   48 83 ec 10             sub    $0x10,%rsp
 8:   89 7d fc                mov    %edi,-0x4(%rbp)
    f(param);
 b:   8b 45 fc                mov    -0x4(%rbp),%eax

as you can see in sub $0x10,%rsp we reserved space in the stack by decreasing (moving down) the stack pointer.

How is memory layout shared with other processes/threads?

The heap is a per-process memory: each process has its own heap, which is shared only within the same process space (like between the process threads, as you said). Why should you free it? Not properly to give space to other processes (at least in modern OS where the process memory is reclaimed by the OS when the process dies), but to prevent heap exhaustion within your process memory: in C, if you don't deallocate the heap memory regions you used, they will be always considered as busy even when they are not used anymore. Thus, to prevent undesired errors, it's a good practice to free the memory in the heap as soon as you don't need it anymore.
In a C program the command line variables are stored in the stack as function variables of the main. What happens is that usually the stack is allocated in the highest portion of a process memory, which is mapped to the high addresses (this is probably the reason why some sources point out what you wrote). But, generally speaking, there isn't any sixth memory area.
As said by the others, the text area can be shared by processes. This area usually contains the binary code, which would be the same for different processes which share the same binary. For performance reasons, the OS can allow to share such memory area, (think for example when you fork a child process).

more info on Memory layout of an executable program (process)

How things are loaded depends very strongly on the OS and on the binary format used, and the details can get nasty. There are standards for how binary files are laid out, but it's really up to the OS how a process's memory is laid out. This is probably why the documentation is hard to find.

To answer your questions:

Books:
- If you're interested in how processes lay out their memory, look at Understanding the Linux Kernel. Chapter 3 talks about process descriptors, creating processes, and destroying processes.
- The only book I know of that covers linking and loading in any detail is Linkers and Loaders by John Levine. There's an online and a print version, so check that out.
Executable code is created by the compiler and the linker, but it's the linker that puts things in the binary format the OS needs. On Linux, this format is typically ELF, on Windows and older Unixes it's COFF, and on Mac OS X it's Mach-O. This isn't a fixed list, though. Some OS's can and do support multiple binary formats. Linkers need to know the output format to create executable files.
The process's memory layout is pretty similar to the binary format, because a lot of binary formats are designed to be mmap'd so that the loader's task is easier.
It's not quite that simple though. Some parts of the binary format (like static data) are not stored directly in the binary file. Instead, the binary just contains the size of these sections. When the process is loaded into memory, the loader knows to allocate the right amount of memory, but the binary file doesn't need to contain large empty sections.
Also, the process's memory layout includes some space for the stack and the heap, where a process's call frames and dynamically allocated memory go. These generally live at opposite ends of a large address space.

This really just scratches the surface of how binaries get loaded, and it doesn't cover anything about dynamic libraries. For a really detailed treatment of how dynamic linking and loading work, read How to Write Shared Libraries.

How to see memory layout of my program in C during run-time?

In Linux, for process PID, look at /proc/PID/maps and /proc/PID/smaps pseudofiles. (The process itself can use /proc/self/maps and /proc/self/smaps.)

Their contents are documented in man 5 proc.

Here's an example of how you might read the contents into a linked list of address range structures.

mem-stats.h:

#ifndef   MEM_STATS_H
#define   MEM_STATS_H
#include <stdlib.h>
#include <sys/types.h>

#define PERMS_READ               1U
#define PERMS_WRITE              2U
#define PERMS_EXEC               4U
#define PERMS_SHARED             8U
#define PERMS_PRIVATE           16U

typedef struct address_range address_range;
struct address_range {
    struct address_range    *next;
    void                    *start;
    size_t                   length;
    unsigned long            offset;
    dev_t                    device;
    ino_t                    inode;
    unsigned char            perms;
    char                     name[];
};

address_range *mem_stats(pid_t);
void free_mem_stats(address_range *);

#endif /* MEM_STATS_H */

mem-stats.c:

#define _POSIX_C_SOURCE 200809L
#define _BSD_SOURCE
#include <stdlib.h>
#include <sys/types.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include "mem-stats.h"

void free_mem_stats(address_range *list)
{
    while (list) {
        address_range *curr = list;

        list = list->next;

        curr->next = NULL;
        curr->length = 0;
        curr->perms = 0U;
        curr->name[0] = '\0';

        free(curr);
    }
}

address_range *mem_stats(pid_t pid)
{
    address_range *list = NULL;
    char          *line = NULL;
    size_t         size = 0;
    FILE          *maps;

    if (pid > 0) {
        char namebuf[128];
        int  namelen;

        namelen = snprintf(namebuf, sizeof namebuf, "/proc/%ld/maps", (long)pid);
        if (namelen < 12) {
            errno = EINVAL;
            return NULL;
        }

        maps = fopen(namebuf, "r");
    } else
        maps = fopen("/proc/self/maps", "r");

    if (!maps)
        return NULL;

    while (getline(&line, &size, maps) > 0) {
        address_range *curr;
        char           perms[8];
        unsigned int   devmajor, devminor;
        unsigned long  addr_start, addr_end, offset, inode;
        int            name_start = 0;
        int            name_end = 0;

        if (sscanf(line, "%lx-%lx %7s %lx %u:%u %lu %n%*[^\n]%n",
                         &addr_start, &addr_end, perms, &offset,
                         &devmajor, &devminor, &inode,
                         &name_start, &name_end) < 7) {
            fclose(maps);
            free(line);
            free_mem_stats(list);
            errno = EIO;
            return NULL;
        }

        if (name_end <= name_start)
            name_start = name_end = 0;

        curr = malloc(sizeof (address_range) + (size_t)(name_end - name_start) + 1);
        if (!curr) {
            fclose(maps);
            free(line);
            free_mem_stats(list);
            errno = ENOMEM;
            return NULL;
        }

        if (name_end > name_start)
            memcpy(curr->name, line + name_start, name_end - name_start);
        curr->name[name_end - name_start] = '\0';

        curr->start = (void *)addr_start;
        curr->length = addr_end - addr_start;
        curr->offset = offset;
        curr->device = makedev(devmajor, devminor);
        curr->inode = (ino_t)inode;

        curr->perms = 0U;
        if (strchr(perms, 'r'))
            curr->perms |= PERMS_READ;
        if (strchr(perms, 'w'))
            curr->perms |= PERMS_WRITE;
        if (strchr(perms, 'x'))
            curr->perms |= PERMS_EXEC;
        if (strchr(perms, 's'))
            curr->perms |= PERMS_SHARED;
        if (strchr(perms, 'p'))
            curr->perms |= PERMS_PRIVATE;

        curr->next = list;
        list = curr;
    }

    free(line);

    if (!feof(maps) || ferror(maps)) {
        fclose(maps);
        free_mem_stats(list);
        errno = EIO;
        return NULL;
    }
    if (fclose(maps)) {
        free_mem_stats(list);
        errno = EIO;
        return NULL;
    }

    errno = 0;
    return list;
}

An example program to use the above, example.c:

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include "mem-stats.h"

int main(int argc, char *argv[])
{
    int  arg, pid;
    char dummy;

    if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
        fprintf(stderr, "       %s PID\n", argv[0]);
        fprintf(stderr, "\n");
        fprintf(stderr, "You can use PID 0 as an alias for the command itself.\n");
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }

    for (arg = 1; arg < argc; arg++)
        if (sscanf(argv[arg], " %i %c", &pid, &dummy) == 1) {
            address_range *list, *curr;

            if (!pid)
                pid = getpid();

            list = mem_stats((pid_t)pid);
            if (!list) {
                fprintf(stderr, "Cannot obtain memory usage of process %d: %s.\n", pid, strerror(errno));
                return EXIT_FAILURE;
            }

            printf("Process %d:\n", pid);
            for (curr = list; curr != NULL; curr = curr->next)
                printf("\t%p .. %p: %s\n", curr->start, (void *)((char *)curr->start + curr->length), curr->name);
            printf("\n");
            fflush(stdout);

            free_mem_stats(list);

        } else {
            fprintf(stderr, "%s: Invalid PID.\n", argv[arg]);
            return EXIT_FAILURE;
        }

    return EXIT_SUCCESS;
}

and a Makefile to make building it, simple:

CC      := gcc
CFLAGS  := -Wall -Wextra -O2 -fomit-frame-pointer
LDFLAGS := 
PROGS   := example

.PHONY: all clean

all: clean $(PROGS)

clean:
    rm -f *.o $(PROGS)

%.o: %.c
    $(CC) $(CFLAGS) -c $^

example: mem-stats.o example.o
    $(CC) $(CFLAGS) $^ $(LDFLAGS) -o $@

Note that the three indented lines in the Makefile above must use tab characters, not spaces. It seems that the editor here converts tabs to spaces, so you need to fix that, for example by using

sed -e 's|^  *|\t|' -i Makefile

If you don't fix the indentation, and use spaces in a Makefile, you'll see an error message similar to *** missing separator. Stop.

Some editors automatically convert a tab keypress into a number of spaces, so you may need to delve into the editor settings of whatever editor you use. Often, editors keep a pasted tab character intact, so you can always try pasting a tab from another program.

To compile and run, save the above files and run:

make
./example 0

to print the memory ranges used by the example program itself. If you want to see, say, the memory ranges used by your PulseAudio daemon, run:

./example $(ps -o pid= -C pulseaudio)

Note that standard access restrictions apply. A normal user can only see the memory ranges of the processes that run as that user; otherwise you need superuser privileges (sudo or similar).

About the Memory Layout of Programs in Linux