Why 2 Linux Processes of Same File Cannot Share Text Segment

Why 2 linux processes of same file cannot share text segment?

In modern OSes like Linux and Windows processes are all walled down into their own sandbox by definition. There are shared libraries (so/dll) that are deduplicated by the OS, but they only share the code, no storage memory whatsoever. When a process tries to access memory outside its own process space, the MMU (Memory Management Unit) part of the CPU will generate a hard signal, which will terminate the process if not caught or handled explicitly. In Linux this is called a 'segmentation fault' or segfault, in Windows it's an 'access violation'.

To establish inter-process communication (IPC) a program has to actively initiate this, through synchronization objects and methods, like anonymous/named pipes, memory mapped files, signals, semaphores and a whole lot more depending on the OS.

When running a program in parallel on multiple processors in shared-memory space, does each processor have it's own .TEXT segment?

It looks like as misunderstanding here, partially because of different usage of "shared" term. I.e., sharing of text segment doesn't have much in common with OpenMP memory model.

Speaking on techical side, it's easy to just look at state of text page sharing under Linux. There are useful /proc//smaps files, where for each mapping Linux kernel reports size in Shared_Clean state (see man 5 proc for more details).

For example, when I run single less /proc/self/smaps
for 1st mapping I see

Shared_Clean: 0 kB Private_Clean: 108 kB

But after running 2nd less /proc/self/smaps
in another terminal at same machine I got

Shared_Clean: 108 kB Private_Clean: 0 kB

So, we see the code for /usr/bin/less is actually shared between different processes.

To me, the answer about "sandboxes" from 1st reference is not valid. We can see that sharing is here. You can try yor particular workload and see what occurs for real codes.

Can I create a shared data section in among a Linux process and forked sub-processes?

I know mmap can map 'shared' pages into my process

Yes, do use mmap. Other possible answers: POSIX shared memory (shm_open and friends) and SysV IPC (shmat and friends).

but for some reason, I cannot use mmap or similar functions.

You better figure out what "some reason" is, because you've just disallowed solutions which can work.

What I can do is to use a big char array. So far, I don't know if this is a mission impossible.

That will not work. You can only share the "big char array" at the moment you fork(). Once you have two separate processes, not further communication through this array will be possible.

Linux shared library loading and sharing the code with other process

To be precise, it's not ld.so's job to reserve physical memory or to manage or choose the mapping between virtual and physical memory, it's the kernel's job. When ld.so loads a shared library, it does so through the mmap syscall, and the kernel allocates the needed physical memory⁽¹⁾ and creates a virtual mapping between the library file and the physical memory. What is then returned by mmap is the virtual base address of the mapped library, which will then be used by the dynamic loader as a base to service calls to functions of that library.

Is ld.so going to identify that this shared library is already loaded to the physical memory? How does it work to understand that?

It's not ld.so, but the kernel that is going to identify this. It's a complicated process, but to make it simple, the kernel keeps track of which file is mapped where, and can detect when a request is made to map an already mapped file again, avoiding physical memory allocation if possible.

If the same file (i.e. a file with the same path) is mapped multiple times, the kernel will look at the existing mappings, and if possible it will reuse the same physical pages to avoid wasting memory. So ideally, if a shared library is loaded multiple times, it could be physically allocated only once.

In practice it's not that simple though. Since memory can also be written to, this "sharing" of physical pages can obviously only occur if the page that needs to be shared is unchanged from the original content of the file (otherwise different processes mapping the same file or library would interfere with each other). This is basically always true for code sections (.text) since they are usually read-only, and for other similar sections (like read-only data). It can also happen for RW sections if they are not modified⁽²⁾. So in short, the .text segments of already loaded libraries are usually only allocated into physical memory once.

(1) Actually, the kernel creates the mapping first, and then only allocates physical memory if the process tries to read or write to it through the mapping. This prevents wasting memory when it's not needed.

(2) This technique of sharing physical memory is managed through a copy-on-write mechanism where the kernel initially maps "clean" pages and marks them as "dirty" when they are written to, duplicating them as needed.

Why 2 Linux Processes of Same File Cannot Share Text Segment