Performance - Multithreaded or Multiprocess Applications

performance - multithreaded or multiprocess applications

Processes and threads on linux are very similar to each other - the main difference is that the whole virtual memory is shared as well as certain things like signal handling differ.

This makes for cheaper context switches between threads (no need for costly MMU reloads etc.) but doesn't necessarily cause much difference in speed (especially outside of thread creation).

For designing a highly network intensive application, basically the only solution is to use an evented architecture (otherwise you'll bog down the system with huge amount of processes/threads and spend more time on their management than actually running work code), where you react to I/O on sockets and based on which sockets exhibit activity do apropriate operations.

A famous writeup about the problems faced in such situations is "The C10k problem", available from http://www.kegel.com/c10k.html - it describes different I/O approaches, so despite being a bit dated, it's a very good introduction.

Be careful before jumping deeply into reactor-like designs, though - they can get unwieldy and complex, so see if you can't use library/language that provides a nicer abstraction over it (Erlang is my personal favourite in this, languages with coroutines like Go can be useful too).

Performance difference for multi-thread and multi-process

It depends on how much the various threads or processes (I'll be using the collective term "tasks" for both of them) need to communicate, especially by sharing memory: that's easy, cheap and fast for threads, but not at all for processes, so, if a lot of it is going on, I bet processes' performance is not going to beat threads'.

Also, processes (esp. on Windows) are "heavier" to get started, so if a lot of "task starts" occur, again threads can easily beat processes in terms of performance.

Next, you can have CPUs with "hyperthreading", which can run (at least) two threads on a core very rapidly -- but, not processes (since the "hyperthreaded" threads cannot be using distinct address spaces) -- yet another case in which threads can win performance-wise.

If none of these considerations apply, then the race should be no better than a tie, anyway.

Multiprocessing vs Threading Python

The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads.

Performance - Multithreaded or Multiprocess Applications