Why 2 linux processes of same file cannot share text segment?
In modern OSes like Linux and Windows processes are all walled down into their own sandbox by definition. There are shared libraries (so/dll) that are deduplicated by the OS, but they only share the code, no storage memory whatsoever. When a process tries to access memory outside its own process space, the MMU (Memory Management Unit) part of the CPU will generate a hard signal, which will terminate the process if not caught or handled explicitly. In Linux this is called a 'segmentation fault' or segfault, in Windows it's an 'access violation'.
To establish inter-process communication (IPC) a program has to actively initiate this, through synchronization objects and methods, like anonymous/named pipes, memory mapped files, signals, semaphores and a whole lot more depending on the OS.
When running a program in parallel on multiple processors in shared-memory space, does each processor have it's own .TEXT segment?
It looks like as misunderstanding here, partially because of different usage of "shared" term. I.e., sharing of text segment doesn't have much in common with OpenMP memory model.
Speaking on techical side, it's easy to just look at state of text page sharing under Linux. There are useful /proc//smaps files, where for each mapping Linux kernel reports size in Shared_Clean state (see man 5 proc
for more details).
For example, when I run single less /proc/self/smaps
for 1st mapping I see
Shared_Clean: 0 kB
Private_Clean: 108 kB
But after running 2nd less /proc/self/smaps
in another terminal at same machine I got
Shared_Clean: 108 kB
Private_Clean: 0 kB
So, we see the code for /usr/bin/less is actually shared between different processes.
To me, the answer about "sandboxes" from 1st reference is not valid. We can see that sharing is here. You can try yor particular workload and see what occurs for real codes.
Can I create a shared data section in among a Linux process and forked sub-processes?
I know mmap can map 'shared' pages into my process
Yes, do use mmap
. Other possible answers: POSIX shared memory (shm_open and friends) and SysV IPC (shmat and friends).
but for some reason, I cannot use mmap or similar functions.
You better figure out what "some reason" is, because you've just disallowed solutions which can work.
What I can do is to use a big char array. So far, I don't know if this is a mission impossible.
That will not work. You can only share the "big char array" at the moment you fork()
. Once you have two separate processes, not further communication through this array will be possible.
Linux shared library loading and sharing the code with other process
To be precise, it's not ld.so
's job to reserve physical memory or to manage or choose the mapping between virtual and physical memory, it's the kernel's job. When ld.so
loads a shared library, it does so through the mmap
syscall, and the kernel allocates the needed physical memory(1) and creates a virtual mapping between the library file and the physical memory. What is then returned by mmap
is the virtual base address of the mapped library, which will then be used by the dynamic loader as a base to service calls to functions of that library.
Is
ld.so
going to identify that this shared library is already loaded to the physical memory? How does it work to understand that?
It's not ld.so
, but the kernel that is going to identify this. It's a complicated process, but to make it simple, the kernel keeps track of which file is mapped where, and can detect when a request is made to map an already mapped file again, avoiding physical memory allocation if possible.
If the same file (i.e. a file with the same path) is mapped multiple times, the kernel will look at the existing mappings, and if possible it will reuse the same physical pages to avoid wasting memory. So ideally, if a shared library is loaded multiple times, it could be physically allocated only once.
In practice it's not that simple though. Since memory can also be written to, this "sharing" of physical pages can obviously only occur if the page that needs to be shared is unchanged from the original content of the file (otherwise different processes mapping the same file or library would interfere with each other). This is basically always true for code sections (.text
) since they are usually read-only, and for other similar sections (like read-only data). It can also happen for RW sections if they are not modified(2). So in short, the .text
segments of already loaded libraries are usually only allocated into physical memory once.
(1) Actually, the kernel creates the mapping first, and then only allocates physical memory if the process tries to read or write to it through the mapping. This prevents wasting memory when it's not needed.
(2) This technique of sharing physical memory is managed through a copy-on-write mechanism where the kernel initially maps "clean" pages and marks them as "dirty" when they are written to, duplicating them as needed.
Related Topics
Building Swift Sourcekit on Linux
Vfs: File-Max Limit 1231582 Reached
Why Questionmark Comes in The End of Filename When I Create .Txt File Through Shell Script
What Is The Right Place for Findxxx.Cmake Files for Locally Compiled Libs
Does Zgrep Unzip a File Before Searching
Error When Compiling Linux Kernel 3.2 for Arm
How to Kill a Whole Process Tree with Perl
How to Programmatically Know If I Am in a Vm
Why Do My Keystrokes Turn into Crazy Characters After I Dump a Bunch of Binary Data into My Terminal
How to Rename Files in Zip Archive Without Extracting and Recompressing Them
Vim Pauses If Echo in .Vimrc File
How to Configure/Make/Install Against an Older Version of a Library
Copy Failed: Stat /Var/Lib/Docker/Tmp/Docker-Builder700869788/Private: No Such File or Directory
Ssh Environment Variable for Sudo Access
Is There Some Ansible Equivalent to "Failed_When" for Success
(Mac) Leave Core File Where The Executable Is Instead of /Cores