When We Run an Executable, Do All the Sections Get Loaded into Memory at Once

When we run an executable, do all the sections get loaded into memory at once?

With ELF binaries, sections are not what decides how the binary is loaded into memory. They are just useful metadata for debuggers and other tools and there doesn't need to be a correspondence between segments and sections and often multiple sections are subsumed under one segment. A binary can have no section header at all and still load fine.

What actually decides what is getting loaded and where are the program headers. Each program header describes one memory segment and contains the following information:

the field p_type tells you what kind of information the program header contains. This is typically just PT_LOAD to mean “loadable segment.”
the field p_offset tells you the offset from the beginning of the file where the segment resides. Note that in rare cases, this can lie beyond the end of the file.
the field p_vaddr tells you the virtual address at which the segment is mapped. There is also p_paddr to specify a physical address, but it's generally unused.
the field p_filesz tells you how long the segment is in the file.
the field p_memsz tells you how long the segment is in memory. If this is more than the segment length in the file, the remainder is filled with zeroes.
the field p_flags tells you if the segment is readable (PF_R), writable (PF_W), executable (PF_X) or some combination of these three. When loading a segment, the operating system uses these flags to set up write and executable protection.
the field p_align tells you how the segment is aligned. This is not really important here.

When the operating system loads your binary or when the runtime link editor loads a shared object, it reads the program headers of your binary and loads or maps each segment in the order they appear. Once this is done, your program is executed.

You can get information about the program headers of a binary by running readelf -l binary.

Is an entire static program loaded into memory when launched?

Under OS X, Windows, Linux, and iOS executables are not loaded into RAM when executed. Instead the executable is mapped into the virtual address space of the process. When the process accesses a mapped page of the executable that hasn't loaded into RAM yet, the CPU generates a page fault which the OS handles by reading the page into RAM.

So if you put a huge image file in the data section of your executable, it won't be loaded into RAM until your program first accesses it. A huge image file probably takes of multiple pages of memory (which are generally 4K in size), so if your program only accesses part of the image only part of the image will be loaded into RAM.

Note that under Windows, and maybe other operating systems, there's an significant exception to this. Under Windows an operating system service called the prefetcher will start preloading into memory the parts of any file that it predicts the program will access during start up. It makes these predictions based on the recorded start up access patterns of previous runs of the program. Since "any file" includes the executable itself, along with any DLLs or data files it uses, this means parts of the executable will be preloaded into RAM when the process start. This also means that if the program usually displays a large image at program startup (eg. a splash screen) then the preloader will load the image into RAM whether its stored as part of the executable or as a separate data file.

Does the .data section gets loaded into memory?

The paging model on most systems will cause the pages comprising the sections of the binary not requiring some kind of dynamic linking to only be loaded when they are accessed - Windows is no exception. So, the .data section is memory-mapped as a binary file to your process memory space, but is not actually swapped in until you need it. The process monitor only reports the memory actually in by default, although you can configure the columns to show all of the memory in the image, also. There may also be compiler options you can use to change the paging behavior, and you can always remap the memory manually (perhaps locking it in) if you need.

How is the .data segment loaded into a seperate memory area than the .code segment in x86?

It's up to the bootloader to handle this so it ultimately depends on what the bootloader does, but generally bootloaders are loading standard executable images in standard formats like PECOFF or ELF. They load the kernel much like operating systems load program executables.

Sections exist so that the contents of sections with the same name will all be grouped together contiguously in the executable. The linker will take the contents of all the .text sections in all the input object files and combine them into one .text section in the output executable. Similarly it will do the same for .data and other named sections.

The linker takes all these combined sections and places them one after each other in the executable, creating one contiguous image that can be loaded into memory. Under an operating system the executable would be loaded into memory in one single contiguous chunk. If necessary, relocations would be applied to account for the executable being loaded at a different address than where it was intended to be loaded and uninitialized data segments (.bss) would be initialized with zeroes. Finally the permissions of each page of the executable in memory would be adjusted according to the segments they belong to. For example, pages in .text sections would be marked read-only and executable, while in .data sections they would be marked read/write and not executable.

(Note that this simplifies and glosses over many details how linkers and operating systems create and load executables. It's possible for sections to be merged, renamed, discarded, etc. Padding space may be inserted between segments so that they're page aligned. Under ELF named sections in object files are actually converted unnamed program segments in executables and it's these program segments that determine page permissions.)

A bootloader loads a kernel executable much like an operating system, but may not support relocations, and won't change page permissions because those need an operating system to work. The kernel itself is responsible setting up its own page permissions.

So the kernel code and data gets loaded into one single contiguous area of memory, with that area of memory subdivided into separate areas for the code, data and any other sections the kernel uses.

Does a C program load everything to memory?

C is not interpreted but compiled language.

This means that the original *.c source file is never loaded at execution time. Instead, the compiler will process it once, to produce an executable file containing machine language.

Therefore, the size of source file doesn't directly matter. It may totally be very large if it contains a lot of different use cases, then producing a tiny executable because only the applicable case will be picked at compilation time. However, most of the time, the executable size remains correlated with its source, but it doesn't necessarily means that this will end up in something huge.

Also, included *.h headers file at top of C source files are not actually « importing » a dependence (such as use, require, or import would in other languages). #include statement is only here to insert the content of a file at a given point, but these files usually contain only function prototypes, variable declarations and some precompiler #define clauses, which form the API of an external resource that is linked later to your program.

These external resources are typically other object modules (when you have multiple *.c files within a same project and you don't need to recompile them all from scratch at each time), static libraries or dynamic libraries. These later ones are DLL files under Windows and *.so files under Unix. In this case, the operating system will automatically load the required libraries when you run your program.

Loading an executable into current process's memory, then executing it

You code is a good start, but you are missing a few things.

First is, as you mentioned, resolving imports. What you say looks right, but I've never done this manually like you so I don't know the details. It would be possible for a program to work without resolving imports, but only if you don't use any imported function. Here your code fails because it tries to access an import that hasn't been resolved ; the function pointer contains 0x4242 instead of the resolved address.

The second thing is relocation. To make it simple, PE executable are position independent (can work at any base address), even if the code isn't. To make this work, the file contains a relocation table that is used to adjust all the data that are dependent on the image location. This point is optional if you can load at the preferred address (pINH->OptionalHeader.ImageBase), but it means that if you use the relocation table, you can load your image anywhere, and you can omit the first parameter of VirtualAlloc (and remove the related checks).

You can find more info on import resolving and relocation in this article, if you didn't find it already. There is plenty of other resource you can find.

Also, as mentioned in marom's answer, your program is basically what LoadLibrary do, so in a more practical context, you would use this function instead.

How many copies of program/class gets loaded into memory when multiple users accessing it at the same time

In general, a single copy of a program (i.e. text segment) is loaded into RAM and shared by all instances, so the exact same read-only memory mapped physical pages (though possibly/probably mapped to different addresses in different address spaces, but it's still the same memory). Data is usually private to each process, i.e. each program's data lives in separate pages RAM (though it can be shared).

BUT

The problem is that the actual program here is only the Java runtime interpreter, or the JIT compiler. Eclipse, like all Java programs, is rather data than a program (which however is interpreted as a program). That data is either loaded into the private address space and interpreted by the JVM or turned into an executable by the JIT compiler, resulting in a (temporary) executable binary, which is launched. This means, in principle, each Java program runs as a separate copy, using separate RAM.

Now, you might of course be lucky, and the JVM might load the data as a shared mapping, in this case the bytecode would occupy the same identical RAM in all instances. However, whether that's the case is something only the author of the JVM could tell, and it's not something you can rely on in general.

Also, depending on how clever the JIT is, it might cache that binary for some time and reuse it for identical Java programs, which would be very advantageous, not only because it saves the compilation. All instances launched from the same executable image share the same memory, so this would be just what you want.

It is even likely that this is done -- at least to some extent -- on your JIT compiler, because compiling is rather expensive and it's a common optimization.

Any hand-on exercise to understand how a program is loaded into memory and get executed

The ld.so man page documents several environment variables that may be set to either tweak the dynamic linking process or provide additional details.

e.g.

LD_DEBUG=all cat </dev/null

You can easily obtain the source code for each and every piece involved - Linux kernel, dynamic linker, C library, startup code (crt0.o or similar). You could start by studying the code and making experimental modifications.

How is a program loaded in OS

when a binary file is run, does it pass first through the CPU where the logical address are generated or is it directly copied to the physical memory ?

Typically some code somewhere loads the executable file's headers into memory, and then uses information from the headers to figure out where various pieces of the file (sections - e.g. .text, .data, etc) should be in virtual memory and what each virtual page's virtual permissions should be (if writes are allowed, if execution is allowed).

After this, areas of the virtual address space are set up. Often this is done by memory mapping the relevant part of the file into the virtual address space, without actually loading them into physical memory. In this case each page's actual permissions don't reflect the page's virtual permissions (e.g. a "read/write" page might be "not present" initially, and when software tries to read from the page you'll get a page fault and the page fault handler might fetch the page from disk and change the page to "present, read only"; and then later when software tries to write to the page you might get a second page fault and the page fault handler might do a "copy on write" so that anything else using the same physical page isn't effected and then make the new copy "read/write" so that it matches the original virtual permissions).

While this is happening; the OS could (depending on amount of free physical RAM and whether storage devices have more important data to transfer) be prefetching the file data from disk (e.g. into VFS cache), and could be "opportunistically" updating the process' page tables to avoid the overhead of page faults for pages that have been prefetched.

However; if the OS knows that the file was on unreliable and/or removable media it may decide that using memory mapped files is a bad idea and may actually load the needed executable's sections into memory before executing it; and an OS could have other features that cause the file to be loaded into RAM before it's executed (e.g. if the OS checks that an executable file's digital signature is correct before allowing the file to be executed, then the entire file probably needs to be loaded into memory to allow the digital signature can be checked, and in that case the entire file is likely to still be in memory when virtual address space is set up).

When We Run an Executable, Do All the Sections Get Loaded into Memory at Once