Concurrency of Posix Threads in Multiprocessor MAChine

Concurrency of posix threads in multiprocessor machine

Since you marked your question with "Linux" tag I'm going to answer it according to standard pthreads implementation under linux. If you are talking about "green" threads, which are scheduled at the VM/language level instead of the OS, then your answers are mostly correct. But my comments below are on Linux pthreads.

1) Posix threads are user level threads and kernel is not aware of it.

No this is certainly not correct. The Linux kernel and the pthreads libraries work together to administer the threads. The kernel does the context switching, scheduling, memory management, cache memory management, etc.. There is other administration done at the user level of course but without he kernel, much of the power of pthreads would be lost.

2) Kernel scheduler will treat Process( with all its threads) as one entity for scheduling. It is the thread library that in turn chooses which thread to run. It can slice the cpu time given by the kernel among the run-able threads.

No, the kernel treats each process-thread as one entity. It has it's own rules about time slicing that take processes (and process priorities) into consideration but each sub-process thread is a schedulable entity.

3) User threads can run on different cpu cores. ie Let threads T1 & T2 be created by a Process(T), then T1 can run in Cpu1 and T2 can run in Cpu2 BUT they cant run concurrently.

No. Concurrent executing is expected for multi-threaded programs. That's why synchronization and mutexes are so important and why programmers put up with the complexity of multithreaded programming.

One way to prove this to you is to look at the output of ps with -L option to show the associated threads. ps usually wraps multiple threaded processes into one line but with -L you can see that the kernel has a separate virtual process-id for each thread:

ps -ef | grep 20587
foo    20587     1  1 Apr09 ?        00:16:39 java -server -Xmx1536m ...

versus

ps -eLf | grep 20587
foo    20587     1 20587  0  641 Apr09 ?    00:00:00 java -server -Xmx1536m ...
foo    20587     1 20588  0  641 Apr09 ?    00:00:30 java -server -Xmx1536m ...
foo    20587     1 20589  0  641 Apr09 ?    00:00:03 java -server -Xmx1536m ...
...

I'm not sure if Linux threads still do this but historically pthreads used the clone(2) system call to create another thread copy of itself:

Unlike fork(2), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.

This is different from fork(2) which is used when another full process is created.

Selecting number of threads in a multiprocess multiprocessor environment

'1 to 1.5 times the number of cores' - this appears to be OS/langauge dependent. On Windows/C++, for example, with large numbers of CPU-intensive tasks, the optimum seems to be much more than twice the number of cores with the performance spread very small. If such environments, it seems you may as well just allocate 64 threads on a pool and not bother with the number of cores.

'query/query-ack and response/response - ack model, time must not be wasted in I/O waiting states' - this is unavoidable with such protocols with the high latency of most networks. The delay is enforced by the 'ping-pong' protocol & so there will, inevitably be an I/O wait. Async I/O just moves this wait into the kernel - it's still there!

'large requirement for dynamic memory, its better to go with greater number of processes than threads' - not really. 'large requirement for dynamic memory' usually means that large data buffers are going to be moved about. Large buffers can only be efficiently moved around by reference. This is very easy and quick between threads because of the shared memory space. With processes, you are stuck with awkward and slow inter-process comms.

'Determining number of threads to have in our application' - well, so difficult on several fronts. Given an unknown architecture, design. language and OS, the only advice I have is to try and make everything as flexible and configurable as you reasonably can. If you have a thread pool, make its size a run-time parameter you can tweak. If you have an object pool, try to design it so that you can change its depth. Have some default values that work on your test boxes and then, at installation or while running, you can make any specific changes and tweaks for a particular system.

The other thing with flexible/configurable designs is that you can, at test time, tweak away and fix many of the incorrect decisions, assumptions and guesstimates made by architects, designers, developers and, most of all, customers

Need clarification in Parallel Processing

First, the JVM has a number of background threads that will use multiple CPUs and cores even if the user code never forks another thread. The garbage collector for example will run concurrently in another CPU if possible regardless of the user code.

If your user code never forks another thread, the JVM will never run your code concurrently in multiple CPUs. If you do write your program with multiple threads there is no guarantee that it will be run in multiple CPUs but it is certainly more likely. It depends a lot on what else is running on on the OS and how blocked your threads are. If you threads are consuming a lot of CPU cycles and run for any length of time on a modern OS then yes, your program will use both CPUs.

You can verify this on a Linux OS (and other Unixen) by watching to see if your process consumes more than 100% of CPU at any one time. You can also use ps options to show the underlying threads and their CPU usage. See my answer here: Concurrency of posix threads in multiprocessor machine

Running two threads at the same time

Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...

Threading:

When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.

So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?

Links relevant to threading in C:

SO Example

POSIX

ANSI C

Parallel Processing:

When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.

Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.

Links relevant to parallel processing in C:

OpenMP

More OpenMP (examples)

Gribble Labs - Introduction to OpenMP

CUDA Tookit from NVIDIA

Additional reading on the general topic of threading and architecture:

This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.

Concurrency of Posix Threads in Multiprocessor MAChine