What Does It Mean to Say "Linux Kernel Is Preemptive"

What does it mean to say linux kernel is preemptive?

Imagine the simple view of preemptive multi-tasking. We have two user tasks, both of which are running all the time without using any I/O or performing kernel calls. Those two tasks don't have to do anything special to be able to run on a multi-tasking operating system. The kernel, typically based on a timer interrupt, simply decides that it's time for one task to pause to let another one run. The task in question is completely unaware that anything happened.

However, most tasks make occasional requests of the kernel via syscalls. When this happens, the same user context exists, but the CPU is running kernel code on behalf of that task.

Older Linux kernels would never allow preemption of a task while it was busy running kernel code. (Note that I/O operations always voluntarily re-schedule. I'm talking about a case where the kernel code has some CPU-intensive operation like sorting a list.)

If the system allows that task to be preempted while it is running kernel code, then we have what is called a "preemptive kernel." Such a system is immune to unpredictable delays that can be encountered during syscalls, so it might be better suited for embedded or real-time tasks.

For example, if on a particular CPU there are two tasks available, and one takes a syscall that takes 5ms to complete, and the other is an MP3 player application that needs to feed the audio pipe every 2ms, you might hear stuttering audio.

The argument against preemption is that all kernel code that might be called in task context must be able to survive preemption-- there's a lot of poor device driver code, for example, that might be better off if it's always able to complete an operation before allowing some other task to run on that processor. (With multi-processor systems the rule rather than the exception these days, all kernel code must be re-entrant, so that argument isn't as relevant today.) Additionally, if the same goal could be met by improving the syscalls with bad latency, perhaps preemption is unnecessary.

A compromise is CONFIG_PREEMPT_VOLUNTARY, which allows a task-switch at certain points inside the kernel, but not everywhere. If there are only a small number of places where kernel code might get bogged down, this is a cheap way of reducing latency while keeping the complexity manageable.

What is preemption / What is a preemtible kernel? What is it good for?

Preemptive multitasking - Running several processes/threads on a single processor, creating the illusion that they run concurrently when actually each is allocated small multiplexed time slices to run in. A process is "preempted" when it is scheduled out of execution and waits for the next time slice to run in.

A preemptive kernel is one that can be interrupted in the middle of executing code - for instance in response for a system call - to do other things and run other threads, possibly those that are not in the kernel.

The main advantage of a preemptive kernel is that sys-calls do not block the entire system. If a sys-call takes a long time to finish then it doesn't mean the kernel can't do anything else in this time.
The main disadvantage is that this introduces more complexity to the kernel code, having to handle more end-cases, perform more fine grained locking or use lock-less structures and algorithms.

is linux kernel preemptive or not?

Yes, the kernel is preemptive.

It has been preemptive by default since the 2.6 branch. Of course, its preemption has not always been perfect, as the techniques for balancing preemption with process responsiveness depend heavily on the kernel load profile (which isn't the same for everyone).

What is the difference between nonpreemptive and preemptive kernels, when switching to user mode?

In a non-preemptive kernel, schedule() is called when returning to userspace (and wherever a system call blocks, also on the idle task).

In a preemptive kernel, schedule() is also called when returning from any interrupt, and also in a few other places, e.g. on mutex_unlock() slow path, on certain conditions while receiving network packets, ...

As an example, imagine a process A which issues a syscall which is interrupted by a device-generated interrupt, that is then interrupted by a timer interrupt:

 process A userspace → process A kernelspace → device ISR → timer ISR
                  syscall               device IRQ    timer IRQ

When the timer ISR ends, it returns to another ISR, that then returns to kernelspace, which then returns to userspace. A preemptive kernel checks if it needs to reschedule processes at every return. A non-preemptive kernel only does that check when returning to userspace.

Preemptive & Nonpreemptive Kernel VS Premptive & Nonpreemptive Scheduling

The problem you face is that these terms have no standard meaning. I suspect that your book is using them from the point of view of some specific operating system (which one?—Je ne sais quois). If you have searched the internet, you have certainly found conflicting explanations.

For example, Preemptive scheduling can mean:

Scheduling that will interrupt a running process that does not yield the CPU.
Scheduling that will interrupt a running process before it's quantum has expired.
Your book apparently has yet another definition. I cannot tell the meaning from the excerpt. It is entirely possible that book is just confusing on this point (as it apparently is on so many points). One point is that process states are system dependent. To define the term using process states is quite confusing.

This part of it definition makes sense:

Under nonpreemptive scheduling, once the CPU has been allocated to a process, the process keeps the CPU until it releases the CPU either by terminating or by switching to the waiting state.

The preemptive part of the definition makes no sense.

In the case of the term preemptive kernel, that is pretty standard and the description of it you give is somewhat normal. That said, the book's statement should be a bit more refined because every process has to be removed in kernel mode. Normally, one would say something along the lines of "In a non-preemptive kernel, a process cannot be removed when it has entered kernel mode through an exception."

A preemptive kernel is essential for real-time processing.

So you ask:

This to me seems to be the exact same description of the nonpreemeptive kernel.

You have four theoretical combinations:

Preemptive Scheduling Preemptive Kernel

The operating system can forcibly switch processes at nearly any time.

Non-Preemptive Scheduling Preemptive Kernel

This combination does not exist.

Non-Preemptive Scheduling Non-Preemptive Kernel

The process has to explicitly yield to allow the operating system to switch to another process.

Preemptive Scheduling Nonpreemptive Kernel

The operating system can forcibly switch processes except when the process is executing in kernel mode to process an exception (there may be circumstances where the process cannot be switched while handling an interrupt as well).

Linux kernel: why preemption is disabled when use per-CPU variable?

Why do we disable preemption?

To avoid having the thread preempted and rescheduled on a different processor core.

Isnt preemption something that cant happen when you are in the kernel?

This was true when there was still a big kernel lock. Having one global lock means that if you block in-kernel, no other thread may enter the kernel. Now, with the more fine-grained locking, sleeping in the kernel is possible. Linux can be configured at build-time for other preemption models, e.g. CONFIG_PREEMPT.

While your normal desktop kernel is probably configured with CONFIG_PREEMPT_VOLUNTARY, some distributions also ship CONFIG_PREEMPT as a separate low-latency kernel package, e.g. for audio use. For real-time use cases, The preempt_rt patchset even makes most spinlocks preemptible (hence the name).

Linux Preemptive kernel implications?

There are a lot more opportunities for race conditions as you had mentioned, so yes, you have to be very diligent with locks. You also have to be careful about timing, such as when you enable/disable interrupts or other hardware resources, etc. You don't always have to use locks for these situations, but you may have to reorder you code. Finally, it also effects scheduling, allowing high priority tasks to be much more responsive, which in turn may have a negative effect on the lower priority tasks.

Why kernel preemption is safe only when preempt_count == 0?

While this is an old question, the accepted answer isn't correct.

First of all the title is asking:

Why kernel preemption is safe only when preempt_count > 0?

This isn't correct, it's the opposite. Kernel preemption is disabled when preempt_count > 0, and enabled when preempt_count == 0.

Furthermore, the claim:

If another task is scheduled and tries to grab the lock, it will block (or spin until its time slice ends),

Is not always true.

Say you acquire a spin lock. Preemption is enabled. A process switch happens, and in the context of the new process some softirq runs. Preemption is disabled while running softirqs. If one of those softirqs attempts to accquire your lock it will never stop spinning because preemption is disabled. Thus you have a deadlock.

You have no control over whether the process that preempts yours will run softirqs or not. The preempt_count field where you disable softirqs is process-specific. Softirqs have to run with preemption disabled to preserve the per-cpu serialization of softirqs.

What Does It Mean to Say "Linux Kernel Is Preemptive"