How Is Stack Size of Linux Process Related to Pthread, Fork and Exec

Difference between pthread and fork on gnu/Linux

In C there are some differences however:

fork()

Purpose is to create a new process, which becomes the child process of the caller
Both processes will execute the next instruction following the fork() system call
Two identical copies of the computer's address space,code, and stack are created one for parent and child.

Thinking of the fork as it was a person; Forking causes a clone of your program (process), that is running the code it copied.

pthread_create()

Purpose is to create a new thread in the program which is given the same process of the caller
Threads within the same process can communicate using shared memory. (Be careful!)
The second thread will share data,open files, signal handlers and signal dispositions, current working directory, user and group ID's. The new thread will get its own stack, thread ID, and registers though.

Continuing the analogy; your program (process) grows a second arm when it creates a new thread, connected to the same brain.

C++ : fork/exec or pthread?

When you do fork() all file descriptors of your process are duplicated in the new one. And when you do exec*() all file descriptors are also kept, unless they are marked with the flag FD_CLOEXEC.

My guess is that some fd used by some library (Xlib, probably) is inherited by the new process, and that the duplication is causing chaos in your program.

In these cases is useful the BSD function closefrom() (closefrom(3)) if you want to keep the standard I/O opened. Unfortunately, in linux there is no such function, so you have to do a close-all loop or similar cruft:

int open_max = sysconf (_SC_OPEN_MAX);
for (int i = 3; i < open_max; i++)
    close(i);

You can read more about this problem here.

virtual memory consumption of pthreads

You can try using a drop in garbage collecting replacement for malloc(), and see if that solves your problem. If it does, find the leaks and fix them, then get rid of the garbage collector.

Its 'interesting' to chase these kinds of problems on platforms that most heap analyzers and profilers (e.g. valgrind) don't fully (if at all) support.

On another note, given the constraints .. I'm assuming you have decreased the default thread stack size? I think the default is 8M, you probably don't need that much. See pthread_attr_setstacksize() if you haven't adjusted it.

Edit:

You can check the default stack size with pthread_attr_getstacksize(). If it is at 8M, you've already blown your ceiling during thread creation (10 threads, as you mentioned).

Mystery pthread problem with fork()

I came to the conclusion that it was probably this phenomenon:

http://kerneltrap.org/mailarchive/linux-kernel/2008/8/15/2950234/thread

as the bug is difficult to trigger on our development systems but is generally reported by users running on large shared machines; also the forked application starts a JVM, which itself allocates a lot of threads. The problem is also associated with the machine being loaded, and extensive memory usage (we have a machine with 128Gb of RAM and processes may be 10-100G in size).

I've been reading the O'Reilly pthreads book, which explains pthread_atfork(), and suggests the use of a "surrogate parent" process forked from the main process at startup from which subprocesses are run. It also suggests the use of a pre-created thread pool. Both of these seem like good ideas, so I'm going to implement at least one of them.

Forking vs Threading

The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.

Specifically to your questions:

When should you prefer fork() over threading and vice-verse?

On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.

If I want to call an external application as a child, then should I use fork() or threads to do it?

If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.

While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?

Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.

Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?

This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.

How to mmap the stack for the clone() system call on linux?

Joseph, in answer to your last question:

When a user creates a "normal" new process, that's done by fork(). In this case, the kernel doesn't have to worry about creating a new stack at all, because the new process is a complete duplicate of the old one, right down to the stack.

If the user replaces the currently running process using exec(), then the kernel does need to create a new stack - but in this case that's easy, because it gets to start from a blank slate. exec() wipes out the memory space of the process and reinitialises it, so the kernel gets to say "after exec(), the stack always lives HERE".

If, however, we use clone(), then we can say that the new process will share a memory space with the old process (CLONE_VM). In this situation, the kernel can't leave the stack as it was in the calling process (like fork() does), because then our two processes would be stomping on each other's stack. The kernel also can't just put it in a default location (like exec()) does, because that location is already taken in this memory space. The only solution is to allow the calling process to find a place for it, which is what it does.