Executable Object Files and Virtual Memory

In general (not specifically for Linux)...

When an executable file is started, the OS (kernel) creates a virtual address space and an (initially empty) process, and examines the executable file's header. The executable file's header describes "sections" (e.g. .text, .rodata, .data, .bss, etc) where each section has different attributes - if the contents of the section should be put in the virtual address space or not (e.g. is a symbol table or something that isn't used at run-time), if the contents are part of the file or not (e.g. .bss), and if the area should be executable, read-only or read/write.

Typically, (used parts of) the executable file are cached by the virtual file system; and pieces of the file that are already in the VFS cache can be mapped (as "copy on write") into the new process' virtual address space. For parts that aren't already in the VFS cache, those pieces of the file can be mapped as "need fetching" into the new process' virtual address space.

Then the process is started (given CPU time).

If the process reads data from a page that hasn't been loaded yet; the OS (kernel) pauses the process, fetches the page from the file on disk into the VFS cache, then also maps the page as "copy on write" into the process; then allows the process to continue (allows the process to retry the read from the page that wasn't loaded, which will work now that the page is loaded).

If the process writes to a page that is still "copy on write"; the OS (kernel) pauses the process, allocates a new page and copies the original page's data into it, then replaces the original page with the process' own copy; then allows the process to continue (allows the process to retry the write which will work now that the process has it's own copy).

If the process writes to data from a page that hasn't been loaded yet; the OS (kernel) combines both of the previous things (fetches original page from disk into VFS cache, creates a copy, maps the process' copy into the process' virtual address space).

If the OS starts to run out of free RAM; then:

pages of file data that are in the VFS cache but aren't shared as "copy on write" with any process can be freed in the VFS without doing anything else. Next time the file is used those pages will be fetched from the file on disk into the VFS cache.
pages of file data that are in the VFS cache and are also shared as "copy on write" with any process can be freed in the VFS and the copies in any/all processes marked as "not fetched yet". Next time the file is used (including when a process accesses the "not fetched yet" page/s) those pages will be fetched from the file on disk into the VFS cache and then mapped as "copy on write" in the process/es).
pages of data that have been modified (either because they were originally "copy on write" but got copied, or because they weren't part of the executable file at all - e.g. .bss section, the executable's heap space, etc) can be saved to swap space and then freed. When the process accesses the page/s again they will be fetched from swap space.

Note: If the executable file is stored on unreliable media (e.g. potentially scratched CD) a "smarter than average" OS may load the entire executable file into VFS cache and/or swap space initially; because there's no sane way to handle "read error from memory mapped file" while the process is using the file other than making the process crash (e.g. SIGSEGV) and making it look like the executable was buggy when it was not, and because this improves reliability (because you're depending on more reliable swap and not depending on a less reliable scratched CD). Also; if the OS guards against file corruption or malware (e.g. has a CRC or digital signature built into executable files) then the OS may (should) load everything into memory (VFS cache) to check the CRC or digital signature before allowing the executable to be executed, and (for secure systems, in case the file on disk is modified while the executable is running) when freeing RAM may stored unmodified pages in "more trusted" swap space (the same as it would if the page was modified) to avoid fetching the data from the original "less trusted" file (partly because you don't want to do the whole digital signature check every time a page is loaded from the file).

My question is: suppose the program change the value of global variable from 2018 to 2019 on the run time and it seems that the virtual page that contains the global variable will eventually page out to the disk, which means that .data section has the global variable to be 2019 now, so we change the executable object file which are not supposed to be changed?

The page containing 2018 will begin as "not fetched", then (when its accessed) loaded into VFS cache and mapped into the process as "copy on write". At either of these points the OS may free the memory and fetch the data (that hasn't been changed) from the executable file on disk if it's needed again.

When the process modifies the global variable (changes it to contain 2019) the OS creates a copy of it for the process. After this point, if the OS wants to free the memory the OS needs to save the page's data in swap space, and load the page's data back from swap space if it's accessed again. The executable file is not modified and (for that page, for that process) the executable file isn't used again.

How does an OS such as Linux load executables into virtual memory?

Shoudn't the binary be loaded in memory to begin with, before the OS can examine the executable file's header?

Well, only the header of the binary has to be loaded into memory for this step. The kernel loads the header and inspects it to see how to set up mappings for the various sections of the binary. The header might say, for instance, "map bytes 4096-65535 of the binary into memory at address 0x12345000, read-only and executable"; "map 16384 bytes of zero-initialized memory at address 0xdeadf000, read-write", and so on. After these mappings are set up, the kernel doesn't need to keep the binary's header in memory anymore, and can free that space.

Also in case the binary is loaded by the OS, does it get loaded entirely?

No.

or it does lazy loading and loads pages as needed afterwards.

Yes.

How much would it load initially?

Potentially none at all. It can rely instead on the page fault handler to do it when the process actually accesses the memory. In that case, the sysret, or whatever instruction the kernel uses to transfer control to the program's entry point, would itself cause a page fault, at which point the page containing the first instruction at the entry point would be loaded from the binary as specified by the mapping for that address. When the fault handler returned, that first instruction would be in memory and would be executed. As the process executes more instructions touching more memory, more and more of its pages will be loaded.

The kernel could, as an optimization, prefault some of these pages into memory, based on guesses as to which ones are likely to be accessed in the near future. I don't know exactly to what extent this is done.

How does OS execute binary files in virtual memory?

This question can't be answered in general because it's totally hardware and OS dependent. However a typical answer is that the initially loaded program can be compiled as you say: Because the VM hardware gives each program its own address space, all addresses can be fixed when the program is linked. No recalculation of addresses at load time is needed.

Things get much more interesting with dynamically loaded libraries because two used by the same initially loaded program might be compiled with the same base address, so their address spaces overlap.

One approach to this problem is to require Position Independent Code in DLLs. In such code all addresses are relative to the code itself. Jumps are usually relative to the PC (though a code segment register can also be used). Data are also relative to some data segment or base register. To choose the runtime location, the PIC code itself needs no change. Only the segment or base register(s) need(s) be set whenever in the prelude of every DLL routine.

PIC tends to be a bit slower than position dependent code because there's additional address arithmetic and the PC and/or base registers can bottleneck the processor's instruction pipeline.

So the other approach is for the loader to rebase the DLL code when necessary to eliminate address space overlaps. For this the DLL must include a table of all the absolute addresses in the code. The loader computes an offset between the assumed code and data base addresses and actual, then traverses the table, adding the offset to each absolute address as the program is copied into VM.

DLLs also have a table of entry points so that the calling program knows where the library procedures start. These must be adjusted as well.

Rebasing is not great for performance either. It slows down loading. Moreover, it defeats sharing of DLL code. You need at least one copy per rebase offset.

For these reasons, DLLs that are part of Windows are deliberately compiled with non-overlapping VM address spaces. This speeds loading and allows sharing. If you ever notice that a 3rd party DLL crunches the disk and loads slowly, while MS DLLs like the C runtime library load quickly, you are seeing the effects of rebasing in Windows.

You can infer more about this topic by reading about object file formats. Here is one example.

virtual memory concepts

This is a very open-ended question that has many confused uses of different terms. I'll try to address as much of your question as I can, and provide some other useful information that may help.

"I have heard that every process has the address space of 4gb in a 32 bit system." Not precisely true. Every process has a maximum addressable space of 3.2GB in a 32-bit system. That doesn't mean that this memory is ever allocated, and it certainly isn't allocated as soon as a process launches.
"Is this the virtual memory we talk about?" No. Virtual memory is nothing directly to do with the addressable space of a process. More on this later.
This question doesn't really make sense, for reasons I will explain below. It's worth noting, though, that multiple processes clealy do fit in memory at one time, because the processes don't automatically allocate their full potentially-available memory. (If a text editor allocated 4GB of memory as soon as it was opened, it would not be a popular text editor!)
I'm no expert, but I highly doubt that every program has its own copy of kernel code at runtime. The security and performance issues alone make this a very unlikely solution.

So now, some definitions that may help you.

Physical memory is (typically!) the RAM in your PC. It is fast, physical memory that your CPU works directly with when running any program. When you specify a physical memory address you are specifying an exact position in memory according to the memory hardware itself.
Virtual memory is (typically!) stored on slower media like your Hard Disk Drive (in what's often called a paging file). When your computer is running low on memory for running processes, it will copy some of the current physical memory contents to the page file, typically from an idle or background application. This makes room in physical memory so that an active process can run. If a program that is no longer in physical memory needs to process data, its data must be reloaded from the page file into physical memory - which may in turn require another program to be paged out of physical memory to make room. The term "Virtual" versus "Physical" memory is used to highlight that this memory doesn't really exist, but it is nonetheless available to the computer. Virtual memory use is very costly in terms of performance, but it can support much larger sizes: indeed, it is possible to have an arbitrarily large amount of virtual memory available, but the performance hit prevents this being a practical solution beyond certain limits.
Logical memory addresses are those used by a single process and allow a process to address its own memory without having to care about where in physical memory the process has been loaded. Your 00000000 to fffffff range is the logical range available to the process, and this is the address that will be used within the process to reference memory. The kernel will translate this to a physical address that is used by the CPU when actually executing code, based on the physical offset (and segmentation) of the process' memory. This physical location could be located anywhere in the available memory space and, if the aplication is paged out and in, the physical location may change during the lifetime of the application. However, the application itself need only ever refer to its own logical address space. The term "logical" versus "physical" address is used to highlight that an address is not the real address, but is the address relative to the relevant subset of memory - that is, to the process' own memory space.

I'm no expert on this, but I hope this helps clarify some of your questions.

If a virtual memory page is executable, does it imply that it is readable?

Assuming IA-32e mode: yes, a page table entry has no bit that inhibits reading, only writing (bit 1, R/W). Pages are always readable, assuming bit 2 (U/S, User/Supervisory) allows access. Bit 63, EXB is the Execute Inhibit bit. That's it for protection flags. Chapter 3.10.3 in the Intel processor manual.

How does compiler lay out code in memory

A comprehensive explanation is probably beyond the scope of this forum. Entire texts are devoted to the subject. However, at a simplistic level you can look at it this way.

The compiler does not lay out the code in memory. It does assume it has the entire memory region to itself. The compiler generates object files where the symbols in the object files typically begin at offset 0.

The linker is responsible for pulling the object files together, linking symbols to their new offset location within the linked object and generating the executable file format.

The linker doesn't lay out code in memory either. It packages code and data into sections typically labeled .text for the executable code instructions and .data for things like global variables and string constants. (and there are other sections as well for different purposes) The linker may provide a hint to the operating system loader where to relocate symbols but the loader doesn't have to oblige.

It is the operating system loader that parses the executable file and decides where code and data are layed out in memory. The location of which depends entirely on the operating system. Typically the stack is located in a higher memory region than the program instructions and data and grows downward.

Each program is compiled/linked with the assumption it has the entire address space to itself. This is where virtual memory comes in. It is completely transparent to the program and managed entirely by the operating system.

Virtual memory typically ranges from address 0 and up to the max address supported by the platform (not infinity). This virtual address space is partitioned off by the operating system into kernel addressable space and user addressable space. Say on a hypothetical 32-bit OS, the addresses above 0x80000000 are reserved for the operating system and the addresses below are for use by the program. If the program tries to access memory above this partition it will be aborted.

The operating system may decide the stack starts at the highest addressable user memory and grows down with the program code located at a much lower address.

The location of the heap is typically managed by the run-time library against which you've built your program. It could live beginning with the next available address after your program code and data.

How to link a lot of C++ object files without using too much memory?

Compile your code into a static library. Then compile against the library which should include only what you need in the final executable.

If you are working with GCC take a look at the AR options. Static library is an archive which you can combine and extract as needed.

Does the Linker OR the Loader make the necessary relocations of a program?

There are no relocation data in the ELF header. Linkable ELF object files store relocation data in subservient sections named .rela.text, .rela.data etc.
Static linker on Linux will choose the starting address where the executable image will be loaded (usually 0x08048000) and then it uses relocations to update instructions and data in code and data sections. After those .rela.text and .rela.data have been handled, subservient .rela section are no longer needed and may be stripped off the final ELF executable file.

When the time comes to load the linked executable file in memory, loader creates a new process in protected mode. All virtual address space is assigned to the process and it is unoccupied. Other programs may be loaded in the same computer but they run happily each in their private addressing space.

The scenario you're afraid of sometimes happens on Windows, when different dynamic libraries were linked to start at conflicting virtual address. Therefore Portable executable format (PE/DLL) keeps relocation records in subservient section .reloc and yes, the loader must relocate all addresses mentioned in this section then.

Similar situation is on DOS in real mode, where there is only one 1 MiB address space common for all processes. MZ executables are linked to virtual address 0 and all adresses which require relocation are kept in Relocation pointer table following the MZ EXE header, and the loader is responsible for updating segment addresses mentioned in this pointer table.

Answer1: Relocation is necessary only if the executable image is loaded at different address that it was linked to, and if it is not linked to Position-Independed Executable.

Answer2: Relocation does not concern addresses of all CPU instruction, only those fields in instruction body (displacement or immediate address) which refer to an address. Such places must be explicitly specified in relocation records. If the relocation information was stripped off the file, your loader should refuse execution.

Good source of information: Linkers and Blog by Ian Lance Taylor.

Executable Object Files and Virtual Memory