What Is Preemption/What Is a Preemtible Kernel? What Is It Good For

What is preemption / What is a preemtible kernel? What is it good for?

Preemptive multitasking - Running several processes/threads on a single processor, creating the illusion that they run concurrently when actually each is allocated small multiplexed time slices to run in. A process is "preempted" when it is scheduled out of execution and waits for the next time slice to run in.

A preemptive kernel is one that can be interrupted in the middle of executing code - for instance in response for a system call - to do other things and run other threads, possibly those that are not in the kernel.

The main advantage of a preemptive kernel is that sys-calls do not block the entire system. If a sys-call takes a long time to finish then it doesn't mean the kernel can't do anything else in this time.
The main disadvantage is that this introduces more complexity to the kernel code, having to handle more end-cases, perform more fine grained locking or use lock-less structures and algorithms.

What does it mean to say linux kernel is preemptive ?

Imagine the simple view of preemptive multi-tasking. We have two user tasks, both of which are running all the time without using any I/O or performing kernel calls. Those two tasks don't have to do anything special to be able to run on a multi-tasking operating system. The kernel, typically based on a timer interrupt, simply decides that it's time for one task to pause to let another one run. The task in question is completely unaware that anything happened.

However, most tasks make occasional requests of the kernel via syscalls. When this happens, the same user context exists, but the CPU is running kernel code on behalf of that task.

Older Linux kernels would never allow preemption of a task while it was busy running kernel code. (Note that I/O operations always voluntarily re-schedule. I'm talking about a case where the kernel code has some CPU-intensive operation like sorting a list.)

If the system allows that task to be preempted while it is running kernel code, then we have what is called a "preemptive kernel." Such a system is immune to unpredictable delays that can be encountered during syscalls, so it might be better suited for embedded or real-time tasks.

For example, if on a particular CPU there are two tasks available, and one takes a syscall that takes 5ms to complete, and the other is an MP3 player application that needs to feed the audio pipe every 2ms, you might hear stuttering audio.

The argument against preemption is that all kernel code that might be called in task context must be able to survive preemption-- there's a lot of poor device driver code, for example, that might be better off if it's always able to complete an operation before allowing some other task to run on that processor. (With multi-processor systems the rule rather than the exception these days, all kernel code must be re-entrant, so that argument isn't as relevant today.) Additionally, if the same goal could be met by improving the syscalls with bad latency, perhaps preemption is unnecessary.

A compromise is CONFIG_PREEMPT_VOLUNTARY, which allows a task-switch at certain points inside the kernel, but not everywhere. If there are only a small number of places where kernel code might get bogged down, this is a cheap way of reducing latency while keeping the complexity manageable.

Linux kernel: why preemption is disabled when use per-CPU variable?

Why do we disable preemption?

To avoid having the thread preempted and rescheduled on a different processor core.

Isnt preemption something that cant happen when you are in the kernel?

This was true when there was still a big kernel lock. Having one global lock means that if you block in-kernel, no other thread may enter the kernel. Now, with the more fine-grained locking, sleeping in the kernel is possible. Linux can be configured at build-time for other preemption models, e.g. CONFIG_PREEMPT.

While your normal desktop kernel is probably configured with CONFIG_PREEMPT_VOLUNTARY, some distributions also ship CONFIG_PREEMPT as a separate low-latency kernel package, e.g. for audio use. For real-time use cases, The preempt_rt patchset even makes most spinlocks preemptible (hence the name).

What is voluntary preemption?

It depends a little bit on the OS.

In some RTOS (real time operatiing system), voluntary preemption means that the running process declares points where it can be preempted (where otherwise it would run until completion). Another way to think of this variant is that of a yield in a coroutine. This is in contrast to most desktop OS where the kernel determines preemption. Keep in mind that some RTOS do not have the concept of a "user mode".

In Linux (at least), "voluntary preemption" is a bit of a misnomer:

Traditionally (no forced preemption), when a user process was making a system call (in kernel mode), it would block until completion. Only user mode code could be preempted.

The preemptive kernel is such that kernel code itself can be preempted. That sounds redundant but it's worth noting that we mean the kernel is preemptible, not that "the kernel supports preemption". Forced/involuntary preemption means that even while servicing a system call, an interrupt for a high priority user process can "force" the kernel to context switch so that it will now run (technically it's not really a context switch, but it has the same effect). This decreases the latency of a user process "seeing" a change in hardware state.

Voluntary preemption is where the kernel periodically checks to see if it should reschedule processes "while doing kernel things". That is, instead of only scheduling/rescheduling user processes at preemption points, it does it periodically while handling things such as I/O. So, where normally a high priority user process might still have to wait for a low priority process to finish its slice, the high prioirty process might now get run "early" as the kernel is checking more frequently if it's to run. This decreases the latency of a user process moving from a suspended state to a running state (at the expense of overall system throughput).

difference between Preemption and context switch

As Mat wrote, this is probably unreasonably scoped. However, I'll try to devote as much attention to the sum of the questions as I would to one reasonably scoped question, in the hopes that this will help you to begin researching.

1 What is the difference between "preemption" and "context switch" ?

Preemption is the act of interrupting a process without its involvement. In this context, that probably means a timer interrupt will fire. The word comes from a legal concept of preemption: the act or right of claiming or purchasing before or in preference to others. For your purposes, that means that when the timer interrupt fires, that the interrupt service routine (ISR) has preference over the code which was previously running. This doesn't necessarily need to involve a kernel at all; you can have code running in any ISR which will run preemptively.

A context switch is what happens when the OS code (running preemptively) alters the state of the processor (the registers, mode, and stack) between one process or thread's context and another. The state of the processor may be at a certain line of code in a one thread. It will have temporary data in registers, a stack pointer at a certain region of memory, and other state information. A preemptive OS can store this state (either to static memory or onto the processes' stack) and load the state of a previous process. This is known as a context switch.

2 What are the key differences between a preemptive and nonpreemptive kernel ? What all work is required from a programmer to make the kernel preemptive ?

In a preemptive kernel, an interrupt can fire in between any two assembly instructions (known as 'sequence points'). In a non-preemptive kernel, the running process must call a yield() function to allow the other threads to run. Preemptive kernels are more complex, but provide a better illusion of concurrency. Non-premptive kernels can be done very simply with setjmp.h, but each thread must regularly call yield() or the other threads will not run.

When a function like yield() is called, the state of the processor is stored automatically. When you want to make your OS preemptive, you must store this information manually.

3 How to create and work with user mode ?

ARM docs say that in user mode, any instruction switching to a privileged mode will be treated as undefined instruction.

Correct. However, they also say that any interrupt will run in privileged mode automatically. On an ARM system, you can use the svc instruction to generate a software interrupt. The SVC code (part of your OS) will then be able to run in privileged mode.

4 If so, the only way for a userspace program to use kernel code is syscalls ?

Correct. At least, this is the only safe or correct way.

5 How does a kernel respond or interact with userspace then ?

On ARM, the SVC instruction can get an 8-bit value. This can be used to generate 256 syscalls such as yield, enable interrupts, disable interrupts, or whatever you need. You may also choose to create a shared memory or message passing interaction mechanism if you need that.

6 Does that mean the only kernel thread after booting (in a simple system) would be the idle thread ?

That depends entirely on how you design your system. It's probably simpler if you choose to start your kernel only after all your threads have been created - that way you don't need to worry about dynamically allocating threads. Or, you can start with the idle thread and add other threads later (through a remote shell? I think you'd want at least one user thread running consistently...)

7 If the Page where kernel code and data resides is unmapped when switching to a user process, then on a syscall or interrupt, how does the kernel code execute without being mapped in the virtual address space ?

Just as kernel mode code runs in privileged mode even if the code was previously executing in user mode, so will kernel mode code run from the main stack pointer (MSP) even if the process code was using a different address space.

8 Does a 'preemptible kernel' only mean that the kernel was designed in a way that it would be safe to have context switch during execution of kernel code ? or does it require more work to be done if any ?

I think that means that the kernel can preempt the user code, not that the kernel itself can be preempted. It would be difficult and unusual for anything to interrupt the kernel. That would require more work, and I'm struggling to see why you'd want it.

Preemptible kernel can be preemptible during disable interrupts?

Whether a kernel is preemptible is a general property of the code base. A preemptible kernel doesn't stop being preemptible just because interrupts were disabled to protect a critical region.

Obviously, it's not preemptible while that interrupt-disabled critical region is executing.

Non-preemptible kernels take interrupts (i.e. have them enabled most of the time) while executing kernel code; they just do not allow interrupt-driven switching to a different task while kernel code is executing.

Why disabling interrupts disables kernel preemption and how spin lock disables preemption

I am not a scheduler guru, but I would like to explain how I see it.
Here are several things.

preempt_disable() doesn't disable IRQ. It just increases a thread_info->preempt_count variable.
Disabling interrupts also disables preemption because scheduler isn't working after that - but only on a single-CPU machine. On the SMP it isn't enough because when you close the interrupts on one CPU the other / others still does / do something asynchronously.
The Big Lock (means - closing all interrupts on all CPUs) is slowing the system down dramatically - so it is why it not anymore in use. This is also the reason why preempt_disable() doesn't close the IRQ.

You can see what is preempt_disable(). Try this:
1. Get a spinlock.
2. Call schedule()

In the dmesg you will see something like "BUG: scheduling while atomic". This happens when scheduler detects that your process in atomic (not preemptive) context but it schedules itself.

Good luck.

Kernel mode preemption

The Linux kernel protects its data structures the same way as anything that runs in a multithreaded environment.

It will likely use some sort of lock to protect data structures that must be accessed atomically. Usually, these include spinlocks, mutexes and semaphores.

There are also functions that disable preemption but this isn't normally used explicitly since locking code will take care of this implicitly.

Pre-emption can occur if the code exceeds the time slice intended for it, then how do we ensure code length/execution time in the spinlock?

The confusion is because you looking everything from "process context" only and totally forget Intr context, premption

http://www.makelinux.net/ldd3/chp-5-sect

What Is Preemption/What Is a Preemtible Kernel? What Is It Good For