Does Linux Schedule a Process or a Thread

How Linux handles threads and process scheduling

The Linux kernel scheduler is actually scheduling tasks, and these are either threads or (single-threaded) processes.

^{So a task (a task_struct inside the kernel), in the context of the scheduler, is the thing being scheduled, and can be some kernel thread like kworker or kswapd, some user thread of a multi-threaded process (like firefox), or the single-thread of a single-threaded process (like bash), identified with that single-threaded process.}

A process is a non-empty finite set (sometimes a singleton) of threads sharing the same virtual address space (and other things like file descriptors, working directory, etc etc...). See also credentials(7), capabilities(7) etc....

Threads on Linux are kernel threads (in the sense of being managed by the kernel, which also creates its own threads), created by the Linux specific clone syscall (which can also be used to create processes on Linux). The pthread_create function is probably built (on Linux) above clone inside NPTL and Gnu Libc (which integrated NPTL on Linux) and musl-libc.

Does linux schedule a process or a thread?

The Linux scheduler (on recent Linux kernels, e.g. 3.0 at least) is scheduling schedulable tasks or simply tasks.

A task may be :

a single-threaded process (e.g. created by fork without any thread library)
any thread inside a multi-threaded process (including its main thread), in particular Posix threads (pthreads)
kernel tasks, which are started internally in the kernel and stay in kernel land (e.g. kworker, nfsiod, kjournald , kauditd, kswapd etc etc...)

In other words, threads inside multi-threaded processes are scheduled like non-threaded -i.e. single threaded- processes.

The low-level clone(2) syscall creates user-land schedulable tasks (and can be used both for creating fork-ed process or for implementation of thread libraries, like pthread). Unless you are a low-level thread library implementor, you don't want to use clone directly.

AFAIK, for multi-threaded processes, the kernel is (almost) not scheduling the process, but each individual thread inside (including the main thread).

^{Actually, there is some notion of thread groups and affinity in the scheduling, but I don't know them well}

These days, processors have generally more than one core, and each core is running a task (at some given instant) so you do have several tasks running in parallel.

CPU quantum times are given to tasks, not to processes

What is the difference in scheduling threads?

Right, there's no difference. The kernel just schedules tasks; each user task refers to a page table (whether that's shared with any other task or not).

Each logical CPU core has its own page-table pointer (e.g. x86 CR3).

And yes, cache coherency is maintained by hardware. The Linux kernel's hand-rolled atomics (using volatile, and inline asm for RMWs and barriers) depend on that.

Linux - threads and process scheduling priorities

Linux no longer schedules processes at all.

Within the kernel, threads are scheduled. The concept of a process is now an artificial construct seen mostly by things outside the kernel. Obviously, the kernel has to know how threads are tied together, but not for scheduling purposes.

Basically, the kernel maintains a whole lot of threads and each thread has a thread group leader, which is what's seen on the outside as the process. A thread has a thread ID and a thread group ID - it's a lot like the relationship between a PID and a PPID (process ID and parent process ID).

When you create a regular thread, the kernel gives it a brand new thread ID but its thread group ID is set identical to the group ID of the thread that created it. That way, it looks like a thread within a process to the outside world.

When you fork, the kernel gives it a brand new thread ID and sets its thread group ID to the same value as its thread ID. That way, it looks like a process to the outside world.

Most non-kernel utilities that report on processes are really just reporting on threads where the thread ID is the same as the thread group ID.

There are subtleties with other methods which are probably too complicated to go into here. What I've written above is (hopefully) a good medium level treatise.

Now, for your specific question, it would be neither case since P1 only has one thread (there is no P1T2).

Withing the kernel, the threads are P1T1, P2T1 and P2T2 and, assuming they have the same scheduling properties and behave the same ^(a), that's how they'll be scheduled.

Linux process and threads scheduling

You can actually test this with a simple program, but from various man pages:

sched_setaffinity:

A child created via fork(2) inherits its parent's CPU affinity mask.
The affinity mask is preserved across an execve(2).

pthread_create:

The new thread inherits copies of the calling thread's capability sets
(see
capabilities(7)) and CPU affinity mask (see sched_setaffinity(2)).

sched_setscheduler:

Child processes inherit the scheduling policy and parameters across a
fork(2).
The scheduling policy and parameters are preserved across execve(2).

Whether the cpu scheduling is based on processes or threads in linux?

Partially based on quantum which is an amount of basic unit of time the thread will execute for. Also I believe there is a priority level so multiple threads are competing for time on the cpu. They wait in line with other threads of same priority level and then run till they are out of quantum. Then they are sent to the back. It's not exact answer but a high level summary.

Also I'm more familar with windows but I think it is same in principles. The process is not an executable code but a unit of storage. So it would be by thread. Linux I read has a more complicated scheduling algorithm than windows (more overhead possibly as a trade off), but it is completely possible I speculate that threads of same process compete for cpu time. The difference is there is no necessary context switch cause the thread sharing process use same address space.

This would explain the diminished returns when using more threads than the physical number of cores(threads on intel). The threads of a process have a small chance of ever running at the same time. Instead they compete. So if you have 4000 threads it means the time any single one of them is running is reduced by 1/4000. However if you were to use the 4000 threads to operate on a single synchronous problem, using a shared storage to load current state you could then get performance gain by having a larger percentage of cpu time as the probability of any of the 4000 threads running is higher.

How does linux process scheduling policy relate to thread scheduling policy?

Linux does not support process scheduling at all. Scheduling is entirely on a thread basis. The sched_* functions incorrectly modify the thread scheduling parameters of the target thread id instead of the scheduling parameters of a process. See:

http://sourceware.org/bugzilla/show_bug.cgi?id=14829 and http://sourceware.org/bugzilla/show_bug.cgi?id=15088

Can kernel schedule user level threads of same process on different cores?

The linux kernel scheduler uses the "task" as its primary schedulable entity. This corresponds to a user-space thread. For a traditional simple Unix-style program, there is only a single thread in the process and so the distinction can be ignored. Other programs of course may have multiple threads. But in all cases, the kernel only schedules tasks (i.e. threads).

Your terminology above therefore doesn't really match the situation. The kernel doesn't really care whether the different threads it schedules are part of the same process or different processes: each thread can be scheduled independently. You can have multiple threads from the same process running on different processors/cores at the same time.

Yes, there are separate run queues for each core.

The paper you reference is, I think, slightly misleading in its phrasing. In particular, saying that the "set of runnable threads is partitioned into..." doesn't give quite the right meaning; that makes it sound like the threads are divided into multiple groups that are then assigned to different cores and can only be executed there. It would be more accurate to say that there is a separate run queue for each core containing a set of threads waiting to execute, and in common use, the scheduler doesn't need to reference the queues for other cores.

But in fact, threads can migrate from one core to another. For example, if there is a thread waiting to run on core A (hence in core A's run queue), but core A is already busy running some other thread, and there is another core that is not busy, the waiting thread may be migrated to that other core and executed there. (This is an oversimplification of course as there are other factors that go into deciding whether/when to migrate a thread.)