Kernel Preemption While Holding Spinlock

Why linux disables kernel preemption after the kernel code holds a spinlock?

The answer to your first question is the reasoning behind your second.

Spinlocks acquired by the kernel may be implemented by turning off preemption, because this ensures that the kernel will complete its critical section without another process interfering. The entire point is that another process will not be able to run until the kernel releases the lock.

There is no reason that it has to be implemented this way; it is just a simple way to implement it and prevents any process from spinning on the lock that the kernel holds. But this trick only works for the case in which the kernel has acquired the lock: user processes can not turn off preemption, and if the kernel is spinning (i.e. it tries to acquire a spinlock but another process already holds it) it better leave preemption on! Otherwise the system will hang since the kernel is waiting for a lock that will not be released because the process holding it can not release it.

The kernel acquiring a spinlock is a special case. If a user level program acquires a spinlock, preemption will not be disabled.

Kernel spin-lock enables preemption before releasing lock

You're looking at the uni-processor defines. As the comment in spinlock_api_up.h says (http://lxr.free-electrons.com/source/include/linux/spinlock_api_up.h#L21):

/*
 * In the UP-nondebug case there's no real locking going on, so the
 * only thing we have to do is to keep the preempt counts and irq
 * flags straight, to suppress compiler warnings of unused lock
 * variables, and to add the proper checker annotations:
 */

The ___LOCK and ___UNLOCK macros are there for annotation purposes, and unless __CHECKER__ is defined (It is defined by sparse), it ends up to be compiled out.

In other words, preempt_enable() and preempt_disable() are the ones doing the locking in a single processor case.

Kernel preemption while holding spinlock

Kernel preemption doesn't guarantee that you don't have a deadlock.

A thread may still hold a lock without ever releasing it, and that would still cause a deadlock if some other thread wants to acquire that same lock. The thread that is holding the lock has to decide to release it to avoid deadlocks. That is to say the thread or some other logic has to preempt the thread and cause it to release the lock. The kernel itself can't cause the thread to release the lock.

The kernel simply can schedule other threads to run, but if some other thread depends on the first thread finishing then that thread will also get blocked.

For example:

Thread A is waiting on a lock for some shared resource that thread B has acquired.

Thread A get's preempted and thread B gets scheduled.

Thread B is waiting on a lock for some shared resource thread A is holding.

Deadlock. Neither thread A nor thread B can make progress.

To break the deadlock something has to preempt thread A or B to release it's lock. Kernel preemption can't do that.

Linux Kernel Preemption during spin_lock and mutex_lock

Current implementations of spin locks use two entirely separate mechanisms to ensure mutual exclusion, one for dealing with inter-processor exclusion and one for dealing with the local processor threads and interrupt handlers.

There is the spin_lock itself which is only there to provide mutex between two or more processor cores. Any processor hitting a locked spin lock is basically stuck until another processor releases it. Spin locks serve no purpose on single processor systems - other than to increase the chance of total deadlock - so are usually removed at kernel compile time.
To provide local processor mutex, spin_lock() calls preempt_disable() (on pre-emptive scheduling systems) to prevent any other thread from running whilst the lock is held; similarly spin_lock_irqsave() also does the equivalent of local_irq_save() to disable interrupts to prevent anything else at all running on the local processor.

As should be obvious from the above, using spin locks can gum up the whole machine so spin locks should just be used for very short periods of time and you should never do anything that might cause a reschedule whilst holding a lock.

The case with mutex_lock is totally different - only threads attempting to access the lock are affected and if a thread hits a locked mutex then a reschedule will occur. For this reason mutex_locks cannot be used in interrupt (or other atomic) contexts.

Why kernel preemption is safe only when preempt_count == 0?

While this is an old question, the accepted answer isn't correct.

First of all the title is asking:

Why kernel preemption is safe only when preempt_count > 0?

This isn't correct, it's the opposite. Kernel preemption is disabled when preempt_count > 0, and enabled when preempt_count == 0.

Furthermore, the claim:

If another task is scheduled and tries to grab the lock, it will block (or spin until its time slice ends),

Is not always true.

Say you acquire a spin lock. Preemption is enabled. A process switch happens, and in the context of the new process some softirq runs. Preemption is disabled while running softirqs. If one of those softirqs attempts to accquire your lock it will never stop spinning because preemption is disabled. Thus you have a deadlock.

You have no control over whether the process that preempts yours will run softirqs or not. The preempt_count field where you disable softirqs is process-specific. Softirqs have to run with preemption disabled to preserve the per-cpu serialization of softirqs.