When Is Clone() and Fork Better Than Pthreads

When is clone() and fork better than pthreads?

The strength and weakness of fork (and company) is that they create a new process that's a clone of the existing process.

This is a weakness because, as you pointed out, creating a new process has a fair amount of overhead. It also means communication between the processes has to be done via some "approved" channel (pipes, sockets, files, shared-memory region, etc.)

This is a strength because it provides (much) greater isolation between the parent and the child. If, for example, a child process crashes, you can kill it and start another fairly easily. By contrast, if a child thread dies, killing it is problematic at best -- it's impossible to be certain what resources that thread held exclusively, so you can't clean up after it. Likewise, since all the threads in a process share a common address space, one thread that ran into a problem could overwrite data being used by all the other threads, so just killing that one thread wouldn't necessarily be enough to clean up the mess.

In other words, using threads is a little bit of a gamble. As long as your code is all clean, you can gain some efficiency by using multiple threads in a single process. Using multiple processes adds a bit of overhead, but can make your code quite a bit more robust, because it limits the damage a single problem can cause, and makes it much easy to shut down and replace a process if it does run into a major problem.

As far as concrete examples go, Apache might be a pretty good one. It will use multiple threads per process, but to limit the damage in case of problems (among other things), it limits the number of threads per process, and can/will spawn several separate processes running concurrently as well. On a decent server you might have, for example, 8 processes with 8 threads each. The large number of threads helps it service a large number of clients in a mostly I/O bound task, and breaking it up into processes means if a problem does arise, it doesn't suddenly become completely un-responsive, and can shut down and restart a process without losing a lot.

Difference between pthread and fork on gnu/Linux

In C there are some differences however:

fork()

Purpose is to create a new process, which becomes the child process of the caller
Both processes will execute the next instruction following the fork() system call
Two identical copies of the computer's address space,code, and stack are created one for parent and child.

Thinking of the fork as it was a person; Forking causes a clone of your program (process), that is running the code it copied.

pthread_create()

Purpose is to create a new thread in the program which is given the same process of the caller
Threads within the same process can communicate using shared memory. (Be careful!)
The second thread will share data,open files, signal handlers and signal dispositions, current working directory, user and group ID's. The new thread will get its own stack, thread ID, and registers though.

Continuing the analogy; your program (process) grows a second arm when it creates a new thread, connected to the same brain.

use fork (process) instead of pthread to acieve the same

That really depends on what you're doing. In general terms you could do something like:

for (size_t i = 0; i < commands_num; i++)
{
    pid_t pid = fork();
    if (pid == 0)
    {
        parseInput(&args[i]);
        exit(0);
    }
    else if (pid == -1)
        printError();
    else
        args[i].thread = pid;
}

The children processes work independently of the parent and go about completing their task, so probably there's will be no need to do the equivalent of pthread_join() here in which case would be waitpid(), unless the parent process has to wait for their product to do something with it.

And speaking of which, once the processes are forked, they no longer share the same memory space, so transiting information between the children and the parent might be a challenge in itself. If you're just printing stuff to stdout you're set to go, otherwise you'll have to figure out pipes to make parent and children communicate.

Another alternative to using system native threads (or pthreads in particular) is to use some green thread library such as libdill, theoretically it will enable multi-threading even in systems that don't support native multi-threading.

Forking vs Threading

The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.

Specifically to your questions:

When should you prefer fork() over threading and vice-verse?

On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.

If I want to call an external application as a child, then should I use fork() or threads to do it?

If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.

While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?

Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.

Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?

This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.

Fair comparison of fork() Vs Thread

mumble ... I do not like your solution for many reasons:

You are not taking in account the execution time of child processes/thread.
You should compare cpu-usage not the bare elapsed time. This way your statistics will not depend from, e.g., disk access congestion.
Let your child process do something. Remember that "modern" fork uses copy-on-write mechanisms to avoid to allocate memory to the child process until needed. It is too easy to exit immediately. This way you avoid quite all the disadvantages of fork.
CPU time is not the only cost you have to account. Memory consumption and slowness of IPC are both disadvantages of fork solution.

You could use "rusage" instead of "clock" to measure real resource usage.

P.S. I do not think you can really measure the process/thread overhead writing a simple test program. There are too many factors and, usually, the choice between threads and processes is driven by other reasons than mere cpu-usage.

Why does multi-threaded code (using pthreads) seem slower than multi-process code (using fork)?

TL;DR: you are measuring time in the wrong way. Use clock_gettime(CLOCK_MONOTONIC, ...) instead of clock().

You are measuring time using clock(), which as stated on the manual page:

[...] returns an approximation of processor time used by the program. [...]
The value returned is the CPU time used so far as a clock_t

The system clock used by clock() measures CPU time, which is the time spent by the calling process while using the CPU. The CPU time used by a process is the sum of the CPU time used by all of its threads, but not its children, since those are different processes. See also: What specifically are wall-clock-time, user-cpu-time, and system-cpu-time in UNIX?

Therefore, the following happens in your 3 scenarios:

No parallelism, sequential code. The CPU time spent running the process is pretty much all there is to measure, and will be very similar to the actual wall-clock time spent. Note that CPU time of a single threaded program is always lower or equal than its wall-clock time.
Multiple child processes. Since you are creating child processes to do the actual work on behalf of the main (parent) process, the parent will use almost zero CPU time: the only thing that it has to do is a few syscalls to create the children and then a few syscalls to wait for them to exit. Most of its time is spent sleeping waiting for the children, not running on the CPU. The children processes are the one that run on the CPU, but you are not measuring their time at all. Therefore you end up with a very short time (1ms). You are basically not measuring anything at all here.
Multiple threads. Since you are creating N threads to do the work, and taking the CPU time in the main thread only, the CPU time of your process will account to the sum of CPU times of the threads. It should come to no surprise that if you are doing the exact same calculation, the average CPU time spent by each thread is T/NTHREADS, and summing them up will give you T/NTHREADS * NTHREADS = T. Indeed you are using roughly the same CPU time as the first scenario, only with a little bit of overhead for creating and managing the threads.

All of this can be solved in two ways:

Carefully account for CPU time in the correct way in each thread/process and then proceed to sum or average the values as needed.
Simply measure wall-clock time (i.e. real human time) instead of CPU time using clock_gettime with one of CLOCK_REALTIME, CLOCK_MONOTONIC or CLOCK_MONOTONIC_RAW. Refer to the manual page for more info.

What happens when a thread forks?

The new process will be the child of the main thread that created the thread. I think.

fork creates a new process. The parent of a process is another process, not a thread. So the parent of the new process is the old process.

Note that the child process will only have one thread because fork only duplicates the (stack for the) thread that calls fork. (This is not entirely true: the entire memory is duplicated, but the child process will only have one active thread.)

If its parent finishes first, the new process will be attached to init process.

If the parent finishes first a SIGHUP signal is sent to the child. If the child does not exit as a result of the SIGHUP it will get init as its new parent. See also the man pages for nohup and signal(7) for a bit more information on SIGHUP.

And its parent is main thread, not the thread that created it.

The parent of a process is a process, not a specific thread, so it is not meaningful to say that the main or child thread is the parent. The entire process is the parent.

One final note: Mixing threads and fork must be done with care. Some of the pitfalls are discussed here.

Are threads copied when calling fork?

No.

Threads are not copied on fork(). POSIX specification says (emphasize is mine):

fork - create a new process

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.

To circumvent this problem, there exists a pthread_atfork() function to help.

Create a new thin process, fork or threads?

fork: I get durability over time but copying the main server means to
copy all its data structures, fd etc. that would weigh down the new
process unnecessarily.

Not as much as you may think. Linux fork() has long been implemented via copy-on-write pages. The child process will have the same address space as the parent, but it will not have its own copies of any pages that neither process modifies. Moreover, the cost of copying modified pages is amortized over time. The initial fork() is pretty cheap.

thread: light and fast but not durable and above all very unstable (if
a thread for some reason generates an error could block everything).

Given that analysis, threads are not a real option after all. Durability and stability are functional requirements. Minimum weight and to some extent even speed are efficiency issues. The former category trumps the latter pretty much every time.

The ideal thing would be a magic system call, that executes an ex novo
process that has a function as entry point but I think there is
nothing like that.

Since you're targeting Linux, have you considered clone()? It does exactly what you describe, though I'm doubtful that what you said fully captures the semantics you imagine for such a feature.

Alternatively, have you considered fork + exec? That would probably require some refactoring, but by performing an exec the child would shed the context shared with its parent as much as is possible, right after the (cheap) initial fork.

Fork() in Threads

forking in a thread duplicates only the calling thread. There's no "duplicate all the functions" at run time but only copying (of the entire address space -- including thread constructs such as mutexes, conditional variables etc) the calling thread into another process. It's generally complex to use fork in a thread and you could run into problems very easily while managing the state of pthread resources (mutexes, cond variables etc).

When Is Clone() and Fork Better Than Pthreads