How the Share Library Be Shared by Different Processes

How can two processes share the same Shared Library?

Presumably you understand page tables and copy-on-write semantics.

Suppose you run an executable a.out, which initializes some global data, and then fork()s. You should have little trouble understanding that all read-only (e.g. code) pages of the a.out are now shared between two processes (the exact same pages of physical memory are mmaped into both virtual memory spaces).

Now suppose that a.out also used libc.so.6 before forking. You should have no trouble understanding that the read-only pages belonging to libc.so.6 are also shared between processes in exactly the same fashion.

Now suppose that you have two separate executables, a.out and b.out, both using libc.so.6. Suppose a.out runs first. The dynamic loader will perform a read-only mapping of libc.so.6 into a.out virtual memory space, and now some of its pages are in physical memory. At that point, b.out starts, and the dynamic loader mmap the same libc.so.6 pages into its virtual memory. Since the kernel already has a mapping for these pages, there is no reason for the kernel to create new physical pages to hold the mapping -- it can re-use previously mapped physical pages. The end result is the same as for the forked binary -- the same physical pages are shared between multiple virtual memory spaces (and multiple processes).

So how can two processes have different copies of global variable,

Very simple: the read-write mappings (which are required for writable data) are not shared between processes (so that one process can write to the variable, and that write will not be visible to the other process).

How are shared libraries addressed in each processes memory?

In Linux, references to shared libraries are resolved by default when the library is effectively called in your code. This is called lazy biding. Thus all binaries are not executable by the processor. Most of them are in fact interpreted (see /lib64/ld-linux-*.so).

To perform that, the ELF binary contains two specific tables :

  • the Procedure linkage table (PLT)
  • the Global offset table (GOT)

The code you're executing references the PLT which performs the redirections. On the first call the GOT will contain a callback address which if executed jumps to the loader which will resolve the address to the dynamic library. The library is mapped in the virtual memory of your program, even though it is only present once in your physical memory.

You're using virtual memory so the addresses seen by your processes will be likely different thus the use of one GOT per process. As for the use of two tables : its principally for security reasons so you're never executing instructions from a writable page.

You can disable lazy biding if you wish by setting the LD_BIND_NOW environment variable.

Linux shared library loading and sharing the code with other process

To be precise, it's not ld.so's job to reserve physical memory or to manage or choose the mapping between virtual and physical memory, it's the kernel's job. When ld.so loads a shared library, it does so through the mmap syscall, and the kernel allocates the needed physical memory(1) and creates a virtual mapping between the library file and the physical memory. What is then returned by mmap is the virtual base address of the mapped library, which will then be used by the dynamic loader as a base to service calls to functions of that library.

Is ld.so going to identify that this shared library is already loaded to the physical memory? How does it work to understand that?

It's not ld.so, but the kernel that is going to identify this. It's a complicated process, but to make it simple, the kernel keeps track of which file is mapped where, and can detect when a request is made to map an already mapped file again, avoiding physical memory allocation if possible.

If the same file (i.e. a file with the same path) is mapped multiple times, the kernel will look at the existing mappings, and if possible it will reuse the same physical pages to avoid wasting memory. So ideally, if a shared library is loaded multiple times, it could be physically allocated only once.

In practice it's not that simple though. Since memory can also be written to, this "sharing" of physical pages can obviously only occur if the page that needs to be shared is unchanged from the original content of the file (otherwise different processes mapping the same file or library would interfere with each other). This is basically always true for code sections (.text) since they are usually read-only, and for other similar sections (like read-only data). It can also happen for RW sections if they are not modified(2). So in short, the .text segments of already loaded libraries are usually only allocated into physical memory once.


(1) Actually, the kernel creates the mapping first, and then only allocates physical memory if the process tries to read or write to it through the mapping. This prevents wasting memory when it's not needed.

(2) This technique of sharing physical memory is managed through a copy-on-write mechanism where the kernel initially maps "clean" pages and marks them as "dirty" when they are written to, duplicating them as needed.

Shared library used by two processes

val=10. Indeed, every process and has its (not it's, "it's" is "it is") own address space. The library has no data space by itself.



Related Topics



Leave a reply



Submit