Forking vs Threading
The main difference between forking and threading approaches is one of operating system architecture. Back in the days when Unix was designed, forking was an easy, simple system that answered the mainframe and server type requirements best, as such it was popularized on the Unix systems. When Microsoft re-architected the NT kernel from scratch, it focused more on the threading model. As such there is today still a notable difference with Unix systems being efficient with forking, and Windows more efficient with threads. You can most notably see this in Apache which uses the prefork strategy on Unix, and thread pooling on Windows.
Specifically to your questions:
When should you prefer fork() over threading and vice-verse?
On a Unix system where you're doing a far more complex task than just instantiating a worker, or you want the implicit security sandboxing of separate processes.
If I want to call an external application as a child, then should I use fork() or threads to do it?
If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.
While doing google search I found people saying it is bad thing to call a fork() inside a thread. why do people want to call a fork() inside a thread when they do similar things?
Not entirely sure but I think it's computationally rather expensive to duplicate a process and a lot of subthreads.
Is it True that fork() cannot take advantage of multiprocessor system because parent and child process don't run simultaneously?
This is false, fork creates a new process which then takes advantage of all features available to processes in the OS task scheduler.
What is the difference between fork and thread?
A fork gives you a brand new process, which is a copy of the current process, with the same code segments. As the memory image changes (typically this is due to different behavior of the two processes) you get a separation of the memory images (Copy On Write), however the executable code remains the same. Tasks do not share memory unless they use some Inter Process Communication (IPC) primitive.
One process can have multiple threads, each executing in parallel within the same context of the process. Memory and other resources are shared among threads, therefore shared data must be accessed through some primitive and synchronization objects (like mutexes, condition variables and semaphores) that allow you to avoid data corruption.
What is the difference between threads and forked processes in Unix?
fork()
copies the current process. Without any special preparations, almost no data is exchanged between child and parent. It is just so that the new process is identical to the old one, but as soon as you write a variable, a copy of the written region is created and the child gets a new physical memory location for this data. This means settings a variable in the child will not be visible for the parent and vice versa.
You can use shared memory, pipes, files, sockets, signals, and probably other IPC methods to communicate between child and parent. For your special case you can use the wait()
or waitpid()
function to wait till your child exits. But I assume you want to know how to exchange data.
Shared memory
You can use the mmap()
call to reserve memory that is shared between parent and child.
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
You can pass the flag MAP_SHARED | MAP_ANONYMOUS
to flags
to create a memory region that is shared. There you can place the shared variable and both can access it. Here is an example.
//creates a region of shared memory to store a bool
static bool *reserveSharedMemory(void)
{
void *data = mmap(NULL, sizeof(bool), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
if(MAP_FAILED==data)
{
//do some error handling here
return NULL;
}
bool *p=data;
*p=false;
return p;
}
Sockets
Sockets allow you send and receive data with something else. With socketpair()
you can create 2 socket file descriptors and you can communicate by writing to one of them and reading at the other file descriptor or verse visa. This way communication with the child process becomes almost the same as communicating with a network socket.
What is the difference between forking and threading in a background process?
Threading means you run the code in another thread in the same process whereas forking means you fork a separate process.
Threading in general means that you'll use less memory since you won't have a separate application instance (this advantage is lessened if you have a copy on write friendly ruby such as ree). Communication between threads is also a little easier.
Depending on your ruby interpreter, ruby may not use extra cores efficiently (jruby is good at this, MRI much worse) so spawning a bunch of extra threads will impact the performance of your web app and won't make full use of your resources - MRI only runs one thread at a time
Forking creates separate ruby instances so you'll make better use of multiple cores. You're also less likely to adversely affect your main application. You need to be a tiny bit careful when forking as you share open file descriptors when you fork, so you usually want to reopen database connections, memcache connections etc.
With MRI I'd use forking, with jruby there's more of a case to be made for threading
Threading vs Forking (with explanation of what I want to do)
Without more detail on your problem, there's not much help that can be given. You want to parallelize a process. Threads and forks in Perl have advantages and disadvantages.
One of the key things that makes Perl threads different from other threads is that data is not shared by default. This makes threads much easier and safer to work with, you don't have to worry about thread safety of libraries or most of your code, just the threaded bit. However it can be a performance drag and memory hungry as Perl must put a copy of the interpreter and all loaded modules into each thread.
When it comes to forking I will only be talking about Unix. Perl emulates fork on Windows using threads, it works but it can be slow and buggy.
Forking Advantages
- Very fast to create a fork
- Very robust
Forking Disadvantages
- Communicating between the processes can be slow and awkward
Thread Advantages
- Thread coordination and data interchange is fairly easy
- Threads are fairly easy to use
Thread Disadvantages
- Each thread takes a lot of memory
- Threads can be slow to start
- Threads can be buggy (better the more recent your perl)
- Database connections are not shared across threads
That last one is a bit of a doozy if the documentation is up to date. If you're going to be doing a lot of SQL, don't use threads.
In general, to get good performance out of Perl threads it's best to start a pool of threads and reuse them. Forks can more easily be created, used and discarded.
Really what it comes down to is what fits your way of thinking and your particular problem.
For either case, you're likely going to want something to manage your pool of workers. For forking you're going to want to use Parallel::ForkManager or Child. Child is particularly nice as it has built in inter-process communication.
For threads you're going to want to use threads::shared, Thread::Queue and read perlthrtut.
When reading articles about Perl threads, keep in mind they were a bit crap when they were introduced in 5.8.0 in 2002, and only serviceable by 5.10.1. After that they've firmed up considerably. Information and opinions about their efficiency and robustness tends to fall rapidly out of date.
What happens when a thread forks?
The new process will be the child of the main thread that created the thread. I think.
fork
creates a new process. The parent of a process is another process, not a thread. So the parent of the new process is the old process.
Note that the child process will only have one thread because fork
only duplicates the (stack for the) thread that calls fork
. (This is not entirely true: the entire memory is duplicated, but the child process will only have one active thread.)
If its parent finishes first, the new process will be attached to init process.
If the parent finishes first a SIGHUP
signal is sent to the child. If the child does not exit as a result of the SIGHUP
it will get init
as its new parent. See also the man pages for nohup
and signal(7)
for a bit more information on SIGHUP
.
And its parent is main thread, not the thread that created it.
The parent of a process is a process, not a specific thread, so it is not meaningful to say that the main or child thread is the parent. The entire process is the parent.
One final note: Mixing threads and fork must be done with care. Some of the pitfalls are discussed here.
Related Topics
Limiting Memory Usage in R Under Linux
Nginx: Serve Multiple Laravel Apps with Same Url But Two Different Sub Locations in Linux
Why Is Rcx Not Used for Passing Parameters to System Calls, Being Replaced with R10
Number of Executed Instructions Different for Hello World Program Nasm Assembly and C
What Does "-Sh: Executable_Path:Not Found" Mean
How to Convert Linux 32-Bit Gcc Inline Assembly to 64-Bit Code
Openshift: "Failed to Execute Control Start" on Node Application
Awk - How to Delete First Column with Field Separator
Symbols from Convenience Library Not Getting Exported in Executable
Hardware Cache Events and Perf
How to Check If X Server Is Running
Bash Capturing Output of Awk into Array
Signed Executables Under Linux
Grep a Large List Against a Large File
Why Do We Need a Bootloader in an Embedded Device
Truncating a File While It's Being Used (Linux)
How to Discover What Linux Distribution Is in Use
Does Gcc Have Any Options to Add Version Info in Elf Binary File