How Does Pthread Implemented in Linux Kernel 3.2

How does pthread implemented in linux kernel 3.2?

On Linux pthread uses the clone syscall with a special flag CLONE_THREAD.

See the documentation of clone syscall.

CLONE_THREAD (since Linux 2.4.0-test8)
If CLONE_THREAD is set, the child is placed in the same thread group as the calling process. To make the remainder of the discussion of CLONE_THREAD more readable, the term "thread" is used to refer to the processes within a thread group.
Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion of a set of threads that share a single PID. Internally, this shared PID is the so-called thread group identifier (TGID) for the thread group. Since Linux 2.4, calls to getpid(2) return the TGID of the caller.

And in fact, Linux do change its thread implementation, since POSIX.1 requires threads share a same process ID.

   In the obsolete LinuxThreads implementation, each of the threads in a process
   has a different process ID.  This is in violation of the POSIX threads
   specification, and is the source of many other nonconformances to the
   standard; see pthreads(7).

Is Pthread library actually a user thread solution?

Q: I understand Pthread is a thread library meeting POSIX standard

A: Yes. Actually, "Pthreads" stands for "Posix threads":
http://en.wikipedia.org/wiki/Pthreads

Q: It is available in Unix-like OS.

A: Actually, it's available for many different OSs ... including Windows, MacOS ... and, of course, Linux, BSD and Solaris.

Q: About thread, I read that there are three different models

Now you're getting fuzzy. "Threads" is a very generic term. There are many, many different models. And many, many different ways you can characterize and/or implement "threads". Including stuff like the Java threading model, or the Ada threading model.

Q: When I call pthread_create() to create a thread, did I create a
user level thread?

A: Yes: Just about everything you do in user space is "protected" in your own, private "user space".

Q: User level thread: the kernel does not know it.

A: No. The kernel knows everything :)

Q: Kernel level thread: kernel directly supports multiple threads of
control in a process.

A: Yes, there is such a thing as "kernel threads".

And, as it happens, Linux makes EXTENSIVE use of kernel threads. For example, every single process in a Linux system is a "kernel thread". And every user-created pthread is ALSO implemented as a new "kernel thread". As are "worker threads" (which are completely invisible to any user-level process).

But this is an advanced topic you do NOT need to understand in order to effectively use pthreads. Here's a great book that discussed this - and many other topics - in detail:

[Linux Kernel Development, Robert Love][1]

Remember: "Pthreads" is an interface. How it's implemented depends on the platform. Linux uses kernel threads; Windows uses Win32 threads, etc.

===========================================================================
ADDENDUM:

Since people still seem to be hitting this old thread, I thought it would be useful to reference this post:

https://stackoverflow.com/a/11255174/421195
Linux typically uses two implementations of pthreads:
LinuxThreads and Native
POSIX Thread Library(NPTL),
although the former is largely obsolete. Kernel from 2.6 provides
NPTL, which provides much closer conformance to SUSv3, and perform
better especially when there are many threads.
You can query the
specific implementation of pthreads under shell using command:
getconf GNU_LIBPTHREAD_VERSION
You can also get a more detailed implementation difference in The
Linux Programming Interface.

"Pthreads" is a library, based on the Posix standard. How a pthreads library is implemented will differ from platform to platform and library to library.
[1]: http://www.amazon.com/Linux-Kernel-Development-Robert-Love/dp/0672329468

how are pthreads on linux seen by scheduler

For modern Linux (NPTL pthread implementation), the scheduler schedules threads, a thread is considered a "Light-weight process". pthread_create is implemented in terms of the clone system call.

pthread vs. kthread in Linux kernel v2.6+

In contrast, a kthread does not have its own address space. Is that correct?

Yes

a thread created by pthread_create() shares the address space with the normal process.

kernel: how to find all threads from a process's task_struct

pthreads: pthread_create() are used in the user space, where multiple threads within your application share the same process address space. For this you need to link your program with the pthread library to use this functionality. pthreads provides multi-threading in the application level or the user space. Internally this translates into a clone() syscall which maps a new struct task_struct to every application thread.

kthreads: Some example of kernel threads are for flushing disk caches, servicing softirqs, flushing dirty buffers etc. These threads run only within the kernel space and don't have access to user space virtual memory and they only use kernel space memory address after PAGE_OFFSET, therefore the current->mm field in the task descriptor is always NULL. Internally this kernel_thread() api translates into do_fork() within the kernel. Kernel threads are created asynchronously either init process comes up or some kernel modules is loaded (ex a file system).

Do Linux kernel processes multithread?

There is no concept of a process in the kernel, so your question doesn't really make sense. The Linux kernel can and does create threads that run completely in kernel context, but all of these threads run in the same address space. There's no grouping of similar threads by PID, although related threads usually have related names.

If multiple kernel threads are working on the same task or otherwise sharing data, then they need to coordinate access to that data via locking or other concurrent algorithms. Of course the pthreads API isn't available in the kernel, but one can use kernel mutexes, wait queues, etc to get the same capabilities as pthread mutexes, condition variables, etc.

Calling these contexts of execution "kernel threads" is a reasonably good name, since they are closely analogous to multiple threads in a userspace process. They all share the (kernel's) address space, but have their own execution context (stack, program counter, etc) and are each scheduled independently and run in parallel. On the other hand, the kernel is what actually implements all the nice POSIX API abstractions (with help from the C library in userspace), so internal to that implementation we don't have the full abstraction.

Kernel_thread() and thread_create(), which function actually creates a new thread?

kernel_thread creates kernel threads. The kernel_thread invokes clone.

In Linux, threads are created with clone and processes are created with fork.

fork, clone and vfork calls in turn invoke do_fork with different value for clone_flags argument.

Understanding Pthreads

I get the same results of the book with linux that contains the libc libuClibc-0.9.30.1.so (1).

root@OpenWrt:~# ./test
main thread pid is 1151
child thread pid is 1153

and I tried to run this program with a linux that contains the libc from ubuntu libc6 (2)

$ ./test
main thread pid is 2609
child thread pid is 2609

The libc (1) use linuxthreads implementation of pthread

And the libc (2) use NPTL ("Native posix thread library") implementation of pthread

According to the linuxthreads FAQ (in J.3 answer):

each thread is really a distinct process with a distinct PID, and
signals sent to the PID of a thread can only be handled by that thread

So in the old libc which use linuxthreads implementation, each thread has its distinct PID

In the new libc version which use NPTL implementation, all threads has the same PID of the main process.

The NPTL was developed by redhat team. and according to the redhat NPTL document: One of the problems which are solved in the NPTL implementation is:

(Chapter: Problems with the Existing Implementation, page5)

Each thread having a different process ID causes compatibility
problems with other POSIX thread implementations. This is in part a
moot point since signals can'tbe used very well but is still
noticeable

And that explain your issue.

You are using the new libc version that contains the NPTL ("Native posix thread library") implementation of pthread

And the Book use an old version of libc that contains linuxthreads implementation of pthread

Kernel Level and User Level Threads

Here is a description of the NPTL library most commonly used today:

NPTL is a so-called 1×1 threads library, in that threads created by
the user (via the pthread_create() library function) are in 1-1
correspondence with schedulable entities in the kernel (tasks, in the
Linux case). This is the simplest possible threading implementation.

If they are schedulable entities by the kernel, then they can be scheduled individually on any processor, and your statement is not true.

Difference between pthread and fork on gnu/Linux

In C there are some differences however:

fork()

Purpose is to create a new process, which becomes the child process of the caller
Both processes will execute the next instruction following the fork() system call
Two identical copies of the computer's address space,code, and stack are created one for parent and child.

Thinking of the fork as it was a person; Forking causes a clone of your program (process), that is running the code it copied.

pthread_create()

Purpose is to create a new thread in the program which is given the same process of the caller
Threads within the same process can communicate using shared memory. (Be careful!)
The second thread will share data,open files, signal handlers and signal dispositions, current working directory, user and group ID's. The new thread will get its own stack, thread ID, and registers though.

Continuing the analogy; your program (process) grows a second arm when it creates a new thread, connected to the same brain.

How Does Pthread Implemented in Linux Kernel 3.2