Under Linux, How C++ Source Code Becomes Executable Files or Static/Dynamic Libraries. and How a Program Get Loaded into Memory When It Runs

Under Linux, How C++ source code becomes executable files or static/dynamic libraries. And how a program get loaded into memory when it runs

You could read:

the Assembly HowTo
From Powerup to Bash prompt
Wikipedia about system calls, Linux kernel, Virtual memory, address space, Process, Compiler, Linker, Assembly language, GCC, ELF
Levine's book on Linkers and Loaders
x86-84, notably about the x86-64 ABI specification
the Advanced Linux Programming book
several syscalls(2) man pages, notably intro(2), execve(2), mmap(2), fork(2)
ELF virus writing howto
GCC documentation (notably internals)
Binutils documentation
Program Library Howto
Drepper's paper: how to write shared libraries

and good books about Linux kernel & application programming.

Minimum 504Kb Memory Usage

When you compile a C program it is linked into an executable. Even though your program is very small it will link to the C runtime which will include some additional code. There may be some error handling and this error handling may write to the console and this code may include sprintf which adds some footprint to your application. You can request the linker to produce a map of the code in your executable to see what is actually included.

Also, an executable file contains more than machine code. There will be various tables for data and dynamic linking which will increase the size of the executable and there may also be some wasted space because the various parts are stored in blocks.

The C runtime will initialize before main is called and this will result in both some code being loaded (e.g. by dynamically linking to various operating system features) as well as memory being allocated for a a heap, a stack for each thread and probably also some static data. Not all of this data may show as "real memory" - the default stack size on OS X appears to be 8 MB and your application is still using much less than this.

Why dynamic libraries source code should be compiled with position-independent code?

-fPIC is by no means the only solution to shared library problem. Prior to ELF Linux used an a.out executable format. In a.out all shared libraries have used unique addresses in global address space so they were always loaded to same fixed address by all processes. This proved extremely hard to manage: all distro packages had to agree between each other which address range is reserved for which library and to constantly revise this agreement as libraries evolved over time.

-fPIC got us out of this mess.

With your suggestion, global dynamic reservation of address ranges across all processes, once some process mapped a library in some memory area, no other process would be able to reuse this area even if it never actually loads the library. For 32-bit systems with 4G of address space (or even 2G is upper 2G are reserved for the kernel) that might quickly exhaust the VM. Another problem comes from the fact that size of main executable file is different across processes so there is no global start address from which libraries can be safely loaded.

Catching calls from a program in runtime and mapping them to other calls

On Linux you're looking for the LD_PRELOAD environment variable. This will load your libraries before any requested by the program. If you provide a function definition that matches one loaded by the target program then your version will be called instead.

You can't really detect what functions a program is calling however. You can however get all the functions in a shared library and implement all of those. You aren't really catching the functions, you are simply reimplementing them.

Projects like Wine do this in some cases, but not in all. They also rewrite some of the dynamic libraries. So when a Win32 loads some DLL it is actually loading the Wine version and not the native version. This is essentially the same concept of replacing the functions with your own.

Lookup LD_PRELOAD for more information.

When to use dynamic vs. static libraries

Static libraries increase the size of the code in your binary. They're always loaded and whatever version of the code you compiled with is the version of the code that will run.

Dynamic libraries are stored and versioned separately. It's possible for a version of the dynamic library to be loaded that wasn't the original one that shipped with your code if the update is considered binary compatible with the original version.

Additionally dynamic libraries aren't necessarily loaded -- they're usually loaded when first called -- and can be shared among components that use the same library (multiple data loads, one code load).

Dynamic libraries were considered to be the better approach most of the time, but originally they had a major flaw (google DLL hell), which has all but been eliminated by more recent Windows OSes (Windows XP in particular).

Does a C program load everything to memory?

C is not interpreted but compiled language.

This means that the original *.c source file is never loaded at execution time. Instead, the compiler will process it once, to produce an executable file containing machine language.

Therefore, the size of source file doesn't directly matter. It may totally be very large if it contains a lot of different use cases, then producing a tiny executable because only the applicable case will be picked at compilation time. However, most of the time, the executable size remains correlated with its source, but it doesn't necessarily means that this will end up in something huge.

Also, included *.h headers file at top of C source files are not actually « importing » a dependence (such as use, require, or import would in other languages). #include statement is only here to insert the content of a file at a given point, but these files usually contain only function prototypes, variable declarations and some precompiler #define clauses, which form the API of an external resource that is linked later to your program.

These external resources are typically other object modules (when you have multiple *.c files within a same project and you don't need to recompile them all from scratch at each time), static libraries or dynamic libraries. These later ones are DLL files under Windows and *.so files under Unix. In this case, the operating system will automatically load the required libraries when you run your program.

Extend C++ application (executable) with dynamically linked library code?

What you are describing is a plugin model where users or customers can provide their own compiled extensions to "handle" functionality on behalf of an application.

On Windows, you specify that customers build a DLL and export a function that matches your expected function signature. On Linux, you ask them to build a shared library (.so) - which is nearly the same thing as a DLL.

In your application, you will invoke LoadLibrary and GetProcAddress to load the customer's DLL from file and to get a function pointer to his implementation of handleGetResource:

 typedef void (__stdcall *HANDLE_GET_RESOURCE)(Request* request, Response* response);
 HMODULE hMod = LoadLibrary("C:\\Path\\To\\Customers\\Dll\\TheirHandler.dll");
 HANDLE_GET_REQUEST handler = (HANDLE_GET_RESOURCE)GetProcAdderss(hMod, "handleGetResource");

Then to invoke:

 handler(&request, &response);

Then your customer can build a DLL with a definition like the following:

extern "C"  __declspec(dllexport) void __stdcall handleGetResource(Request* req, Response* res)
{
   // their code goes here
}

You can do the same thing on Linux with dlopen and dlsym in place of LoadLibrary and GetProcAddress.

For Windows, I would also be explicitly in having these typedefs and function defintions be explict with regards to __cdecl or __stdcall.

As to how the customer registers the full path to their compiled binary - that's a mechanism that you can decide for yourself. (e.g. command line, registry, expected file name in current directory, etc...)

When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:

The kernel creates a process context (more-or-less, virtual machine)
Into that process context, it defines a virtual memory mapping that maps
from RAM addresses to the start of your executable file
Assuming that you're dynamically linked (the default/usual), the ld.so program
(e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
The kernel does a jmp into the startup routine of your program (for a C program, that's
something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
The kernel's Page Fault handler loads some section of your program, including the part
that caused the page fault, into RAM.
As your program runs, if it accesses a virtual address that doesn't have RAM backing
it up right now, Page Faults will occur and cause the kernel to suspend the program
briefly, load the page from disc, and then return control to the program. This all
happens "between instructions" and is normally undetectable.
As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.

Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.

So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).

If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).

Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.

(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages
to reduce the number of page faults, but the theory is the same, regardless of the
optimizations

Under Linux, How C++ Source Code Becomes Executable Files or Static/Dynamic Libraries. and How a Program Get Loaded into Memory When It Runs