Why Disabling Interrupts Disables Kernel Preemption and How Spin Lock Disables Preemption

Why disabling interrupts disables kernel preemption and how spin lock disables preemption

I am not a scheduler guru, but I would like to explain how I see it.
Here are several things.

preempt_disable() doesn't disable IRQ. It just increases a thread_info->preempt_count variable.
Disabling interrupts also disables preemption because scheduler isn't working after that - but only on a single-CPU machine. On the SMP it isn't enough because when you close the interrupts on one CPU the other / others still does / do something asynchronously.
The Big Lock (means - closing all interrupts on all CPUs) is slowing the system down dramatically - so it is why it not anymore in use. This is also the reason why preempt_disable() doesn't close the IRQ.

You can see what is preempt_disable(). Try this:
1. Get a spinlock.
2. Call schedule()

In the dmesg you will see something like "BUG: scheduling while atomic". This happens when scheduler detects that your process in atomic (not preemptive) context but it schedules itself.

Good luck.

Why linux disables kernel preemption after the kernel code holds a spinlock?

The answer to your first question is the reasoning behind your second.

Spinlocks acquired by the kernel may be implemented by turning off preemption, because this ensures that the kernel will complete its critical section without another process interfering. The entire point is that another process will not be able to run until the kernel releases the lock.

There is no reason that it has to be implemented this way; it is just a simple way to implement it and prevents any process from spinning on the lock that the kernel holds. But this trick only works for the case in which the kernel has acquired the lock: user processes can not turn off preemption, and if the kernel is spinning (i.e. it tries to acquire a spinlock but another process already holds it) it better leave preemption on! Otherwise the system will hang since the kernel is waiting for a lock that will not be released because the process holding it can not release it.

The kernel acquiring a spinlock is a special case. If a user level program acquires a spinlock, preemption will not be disabled.

Linux kernel: why preemption is disabled when use per-CPU variable?

Why do we disable preemption?

To avoid having the thread preempted and rescheduled on a different processor core.

Isnt preemption something that cant happen when you are in the kernel?

This was true when there was still a big kernel lock. Having one global lock means that if you block in-kernel, no other thread may enter the kernel. Now, with the more fine-grained locking, sleeping in the kernel is possible. Linux can be configured at build-time for other preemption models, e.g. CONFIG_PREEMPT.

While your normal desktop kernel is probably configured with CONFIG_PREEMPT_VOLUNTARY, some distributions also ship CONFIG_PREEMPT as a separate low-latency kernel package, e.g. for audio use. For real-time use cases, The preempt_rt patchset even makes most spinlocks preemptible (hence the name).

Kernel spin-lock enables preemption before releasing lock

You're looking at the uni-processor defines. As the comment in spinlock_api_up.h says (http://lxr.free-electrons.com/source/include/linux/spinlock_api_up.h#L21):

/*
 * In the UP-nondebug case there's no real locking going on, so the
 * only thing we have to do is to keep the preempt counts and irq
 * flags straight, to suppress compiler warnings of unused lock
 * variables, and to add the proper checker annotations:
 */

The ___LOCK and ___UNLOCK macros are there for annotation purposes, and unless __CHECKER__ is defined (It is defined by sparse), it ends up to be compiled out.

In other words, preempt_enable() and preempt_disable() are the ones doing the locking in a single processor case.

Why do kprobes disable preemption and when is it safe to reenable it?

At least on x86, the implementation of Kprobes relies on the fact that preemption is disabled while the Kprobe handlers run.

When you place an ordinary (not Ftrace-based) Kprobe on an instruction, the first byte of that instruction is overwritten with 0xcc (int3, "software breakpoint"). If the kernel tries to execute that instruction, a trap occurs and kprobe_int3_handler() is called (see the implementation of do_int3()).

To call your Kprobe handlers, kprobe_int3_handler() finds which Kprobe hit, saves it as percpu variable current_kprobe and calls your pre-handler. After that, it prepares everything to single-step over the original instruction. After the single-stepping, your post-handler is called and then some cleanup is performed. current_kprobe and some other per-cpu data are used to do all this. Preemption is only enabled after that.

Now, imagine the pre-handler has enabled preemption, was preempted right away and resumed on a different CPU. If the implementation of Kprobes tried to access current_kprobe or other per-cpu data, the kernel would likely crash (NULL pointer deref if there were no current_kprobe on that CPU at the moment) or worse.

Or, the preempted handler could resume on the same CPU but another Kprobe could hit there while it was sleeping - current_kprobe, etc. would be overwritten and disaster would be very likely.

Re-enabling preemption in Kprobe handlers could result in difficult-to-debug kernel crashes and other problems.

So, in short, this is because Kprobes are designed this way, at least on x86. I cannot say much about their implementation on other architectures.

Depending on what you are trying to accomplish, other kernel facilities might be helpful.

For instance, if you only need to run your code at the start of some functions, take a look at Ftrace. Your code would then run in the same conditions as the functions you hook it into.

All that being said, it was actually needed in one of my projects to use Kprobes so that the handlers were running in the same conditions w.r.t. preemption as the probed instructions. You can find the implementation here. However, it had to jump through the hoops to achieve that without breaking anything. It has been working OK so far but it is more complex than I would like, has portability issues too.

Preemptible kernel can be preemptible during disable interrupts?

Whether a kernel is preemptible is a general property of the code base. A preemptible kernel doesn't stop being preemptible just because interrupts were disabled to protect a critical region.

Obviously, it's not preemptible while that interrupt-disabled critical region is executing.

Non-preemptible kernels take interrupts (i.e. have them enabled most of the time) while executing kernel code; they just do not allow interrupt-driven switching to a different task while kernel code is executing.

Why Disabling Interrupts Disables Kernel Preemption and How Spin Lock Disables Preemption