Spinlock Versus Semaphore

Spinlock versus Semaphore

Spinlock and semaphore differ mainly in four things:

1. What they are
A spinlock is one possible implementation of a lock, namely one that is implemented by busy waiting ("spinning"). A semaphore is a generalization of a lock (or, the other way around, a lock is a special case of a semaphore). Usually, but not necessarily, spinlocks are only valid within one process whereas semaphores can be used to synchronize between different processes, too.

A lock works for mutual exclusion, that is one thread at a time can acquire the lock and proceed with a "critical section" of code. Usually, this means code that modifies some data shared by several threads.

A semaphore has a counter and will allow itself being acquired by one or several threads, depending on what value you post to it, and (in some implementations) depending on what its maximum allowable value is.

Insofar, one can consider a lock a special case of a semaphore with a maximum value of 1.

2. What they do
As stated above, a spinlock is a lock, and therefore a mutual exclusion (strictly 1 to 1) mechanism. It works by repeatedly querying and/or modifying a memory location, usually in an atomic manner. This means that acquiring a spinlock is a "busy" operation that possibly burns CPU cycles for a long time (maybe forever!) while it effectively achieves "nothing".

The main incentive for such an approach is the fact that a context switch has an overhead equivalent to spinning a few hundred (or maybe thousand) times, so if a lock can be acquired by burning a few cycles spinning, this may overall very well be more efficient. Also, for realtime applications it may not be acceptable to block and wait for the scheduler to come back to them at some far away time in the future.

A semaphore, by contrast, either does not spin at all, or only spins for a very short time (as an optimization to avoid the syscall overhead). If a semaphore cannot be acquired, it blocks, giving up CPU time to a different thread that is ready to run. This may of course mean that a few milliseconds pass before your thread is scheduled again, but if this is no problem (usually it isn't) then it can be a very efficient, CPU-conservative approach.

3. How they behave in presence of congestion
It is a common misconception that spinlocks or lock-free algorithms are "generally faster", or that they are only useful for "very short tasks" (ideally, no synchronization object should be held for longer than absolutely necessary, ever).

The one important difference is how the different approaches behave in presence of congestion.

A well-designed system normally has low or no congestion (this means not all threads try to acquire the lock at the exact same time). For example, one would normally not write code that acquires a lock, then loads half a megabyte of zip-compressed data from the network, decodes and parses the data, and finally modifies a shared reference (append data to a container, etc.) before releasing the lock. Instead, one would acquire the lock only for the purpose of accessing the shared resource.

Since this means that there is considerably more work outside the critical section than inside it, naturally the likelihood for a thread being inside the critical section is relatively low, and thus few threads are contending for the lock at the same time. Of course every now and then two threads will try to acquire the lock at the same time (if this couldn't happen you wouldn't need a lock!), but this is rather the exception than the rule in a "healthy" system.

In such a case, a spinlock greatly outperforms a semaphore because if there is no lock congestion, the overhead of acquiring the spinlock is a mere dozen cycles as compared to hundreds/thousands of cycles for a context switch or 10-20 million cycles for losing the remainder of a time slice.

On the other hand, given high congestion, or if the lock is being held for lengthy periods (sometimes you just can't help it!), a spinlock will burn insane amounts of CPU cycles for achieving nothing.

A semaphore (or mutex) is a much better choice in this case, as it allows a different thread to run useful tasks during that time. Or, if no other thread has something useful to do, it allows the operating system to throttle down the CPU and reduce heat / conserve energy.

Also, on a single-core system, a spinlock will be quite inefficient in presence of lock congestion, as a spinning thread will waste its complete time waiting for a state change that cannot possibly happen (not until the releasing thread is scheduled, which isn't happening while the waiting thread is running!). Therefore, given any amount of contention, acquiring the lock takes around 1 1/2 time slices in the best case (assuming the releasing thread is the next one being scheduled), which is not very good behaviour.

4. How they're implemented
A semaphore will nowadays typically wrap sys_futex under Linux (optionally with a spinlock that exits after a few attempts).

A spinlock is typically implemented using atomic operations, and without using anything provided by the operating system. In the past, this meant using either compiler intrinsics or non-portable assembler instructions. Meanwhile both C++11 and C11 have atomic operations as part of the language, so apart from the general difficulty of writing provably correct lock-free code, it is now possible to implement lock-free code in an entirely portable and (almost) painless way.

Difference between Mutex, Semaphore & Spin Locks

First, remember the goal of these 'synchronizing objects' :

These objects were designed to provide an efficient and coherent use of 'shared data' between more than 1 thread among 1 process or from different processes.

These objects can be 'acquired' or 'released'.

That is it!!! End of story!!!

Now, if it helps to you, let me put my grain of sand:

1) Critical Section= User object used for allowing the execution of just one active thread from many others within one process. The other non selected threads (@ acquiring this object) are put to sleep.

[No interprocess capability, very primitive object].

2) Mutex Semaphore (aka Mutex)= Kernel object used for allowing the execution of just one active thread from many others, within one process or among different processes. The other non selected threads (@ acquiring this object) are put to sleep. This object supports thread ownership, thread termination notification, recursion (multiple 'acquire' calls from same thread) and 'priority inversion avoidance'.

[Interprocess capability, very safe to use, a kind of 'high level' synchronization object].

3) Counting Semaphore (aka Semaphore)= Kernel object used for allowing the execution of a group of active threads from many others, within one process or among different processes. The other non selected threads (@ acquiring this object) are put to sleep.

[Interprocess capability however not very safe to use because it lacks following 'mutex' attributes: thread termination notification, recursion?, 'priority inversion avoidance'?, etc].

4) And now, talking about 'spinlocks', first some definitions:

Critical Region= A region of memory shared by 2 or more processes.

Lock= A variable whose value allows or denies the entrance to a 'critical region'. (It could be implemented as a simple 'boolean flag').

Busy waiting= Continuosly testing of a variable until some value appears.

Finally:

Spin-lock (aka Spinlock)= A lock which uses busy waiting. (The acquiring of the lock is made by xchg or similar atomic operations).

[No thread sleeping, mostly used at kernel level only. Ineffcient for User level code].

As a last comment, I am not sure but I can bet you some big bucks that the above first 3 synchronizing objects (#1, #2 and #3) make use of this simple beast (#4) as part of their implementation.

Have a good day!.

References:

-Real-Time Concepts for Embedded Systems by Qing Li with Caroline Yao (CMP Books).

-Modern Operating Systems (3rd) by Andrew Tanenbaum (Pearson Education International).

-Programming Applications for Microsoft Windows (4th) by Jeffrey Richter (Microsoft Programming Series).

Linux semaphores: spinlock or signals?

Use the power of Open Source - just look at source code.

The kernel-space semaphore is defined as

struct semaphore {
raw_spinlock_t lock;
unsigned int count;
struct list_head wait_list;
};

lock is used to protect count and wait_list.

All tasks waiting on a semaphore reside in wait_list. When the semaphore is upped, one tasks is woken up.

User-space semaphores should rely on semaphore-related system calls, Kernel provides. The definition of user-space semaphores is:

/* One semaphore structure for each semaphore in the system. */
struct sem {
int semval; /* current value */
int sempid; /* pid of last operation */
spinlock_t lock; /* spinlock for fine-grained semtimedop */
struct list_head sem_pending; /* pending single-sop operations */
};

The kernel uses definition of the user-space semaphore similar to the kernel-space one. sem_pending is a list of waiting process plus some additional info.

I should highlight again that neither kernel-space semaphore, nor user-space one uses spinlock to wait on lock. Spinlock is included in both structures only to protect structure members from the concurrent access. After the structure is modified, spinlock is released and the task rests in list until woken.

Furthermore, spinlocks are unsuitable to wait on some event from another thread. Before acquiring a spinlock, kernel disables preemption. So, in this case, on uniprocessor machines, spinlock will never be released.

I should also notice that user-space semaphores, while serving on behalf of user-space, are executing in kernel-space.

P.S. Source code for the kernel-space semaphore resides in include/linux/semaphore.h and kernel/semaphore.c, for user-space one in ipc/sem.c

Terminology question: mutex lock, spin lock, sleepable lock

The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.

Maybe you're misreading the book (that is, "The type of mutex lock we have been describing" might not refer to the exact passage you think it does), or the book is outdated. Modern terminology is quite clear in what a mutex is, but spinlocks get a bit muddy.

  • A mutex is a concurrency primitive that allows one agent at a time access to its resource, while the others have to wait in the meantime until it the exclusive access is released. How they wait is not specified and irrelevant, their process might go to sleep, get written to disk, spin in a loop, or perhaps you are using cooperative concurrency (also known as "asynchronous programming") and passing control to the event loop as your 'waiting operation'.

  • A spinlock does not have a clear definition. It might be used to refer to:

    1. A synonym for mutex (this is in my opinion wrong, but it happens).
    2. A specific mutex implementation that always waits in a busy loop.
    3. Any sort of busy-waiting loop waiting for a resource. A semaphore, for example, might also get implemented using a 'spinlock'.

    I would consider any use of the word to refer to a (part of a) specific implementation of a concurrency primitive that waits in a busy loop to be correct, if a more general term is not appropriate. That is, use mutex (or whatever primitive you desire) unless you specifically want to talk about a busy-waiting concurrency primitive.

When should one use a spinlock instead of mutex?

The Theory

In theory, when a thread tries to lock a mutex and it does not succeed, because the mutex is already locked, it will go to sleep, immediately allowing another thread to run. It will continue to sleep until being woken up, which will be the case once the mutex is being unlocked by whatever thread was holding the lock before. When a thread tries to lock a spinlock and it does not succeed, it will continuously re-try locking it, until it finally succeeds; thus it will not allow another thread to take its place (however, the operating system will forcefully switch to another thread, once the CPU runtime quantum of the current thread has been exceeded, of course).

The Problem

The problem with mutexes is that putting threads to sleep and waking them up again are both rather expensive operations, they'll need quite a lot of CPU instructions and thus also take some time. If now the mutex was only locked for a very short amount of time, the time spent in putting a thread to sleep and waking it up again might exceed the time the thread has actually slept by far and it might even exceed the time the thread would have wasted by constantly polling on a spinlock. On the other hand, polling on a spinlock will constantly waste CPU time and if the lock is held for a longer amount of time, this will waste a lot more CPU time and it would have been much better if the thread was sleeping instead.

The Solution

Using spinlocks on a single-core/single-CPU system makes usually no sense, since as long as the spinlock polling is blocking the only available CPU core, no other thread can run and since no other thread can run, the lock won't be unlocked either. IOW, a spinlock wastes only CPU time on those systems for no real benefit. If the thread was put to sleep instead, another thread could have ran at once, possibly unlocking the lock and then allowing the first thread to continue processing, once it woke up again.

On a multi-core/multi-CPU systems, with plenty of locks that are held for a very short amount of time only, the time wasted for constantly putting threads to sleep and waking them up again might decrease runtime performance noticeably. When using spinlocks instead, threads get the chance to take advantage of their full runtime quantum (always only blocking for a very short time period, but then immediately continue their work), leading to much higher processing throughput.

The Practice

Since very often programmers cannot know in advance if mutexes or spinlocks will be better (e.g. because the number of CPU cores of the target architecture is unknown), nor can operating systems know if a certain piece of code has been optimized for single-core or multi-core environments, most systems don't strictly distinguish between mutexes and spinlocks. In fact, most modern operating systems have hybrid mutexes and hybrid spinlocks. What does that actually mean?

A hybrid mutex behaves like a spinlock at first on a multi-core system. If a thread cannot lock the mutex, it won't be put to sleep immediately, since the mutex might get unlocked pretty soon, so instead the mutex will first behave exactly like a spinlock. Only if the lock has still not been obtained after a certain amount of time (or retries or any other measuring factor), the thread is really put to sleep. If the same code runs on a system with only a single core, the mutex will not spinlock, though, as, see above, that would not be beneficial.

A hybrid spinlock behaves like a normal spinlock at first, but to avoid wasting too much CPU time, it may have a back-off strategy. It will usually not put the thread to sleep (since you don't want that to happen when using a spinlock), but it may decide to stop the thread (either immediately or after a certain amount of time; this is called "yielding") and allow another thread to run, thus increasing chances that the spinlock is unlocked (you still have the costs of a thread switch but not the costs of putting a thread to sleep and waking it up again).

Summary

If in doubt, use mutexes, they are usually the better choice and most modern systems will allow them to spinlock for a very short amount of time, if this seems beneficial. Using spinlocks can sometimes improve performance, but only under certain conditions and the fact that you are in doubt rather tells me, that you are not working on any project currently where a spinlock might be beneficial. You might consider using your own "lock object", that can either use a spinlock or a mutex internally (e.g. this behavior could be configurable when creating such an object), initially use mutexes everywhere and if you think that using a spinlock somewhere might really help, give it a try and compare the results (e.g. using a profiler), but be sure to test both cases, a single-core and a multi-core system before you jump to conclusions (and possibly different operating systems, if your code will be cross-platform).

Update: A Warning for iOS

Actually not iOS specific but iOS is the platform where most developers may face that problem: If your system has a thread scheduler, that does not guarantee that any thread, no matter how low its priority may be, will eventually get a chance to run, then spinlocks can lead to permanent deadlocks. The iOS scheduler distinguishes different classes of threads and threads on a lower class will only run if no thread in a higher class wants to run as well. There is no back-off strategy for this, so if you permanently have high class threads available, low class threads will never get any CPU time and thus never any chance to perform any work.

The problem appears as follow: Your code obtains a spinlock in a low prio class thread and while it is in the middle of that lock, the time quantum has exceeded and the thread stops running. The only way how this spinlock can be released again is if that low prio class thread gets CPU time again but this is not guaranteed to happen. You may have a couple of high prio class threads that constantly want to run and the task scheduler will always prioritize those. One of them may run across the spinlock and try to obtain it, which isn't possible of course, and the system will make it yield. The problem is: A thread that yielded is immediately available for running again! Having a higher prio than the thread holding the lock, the thread holding the lock has no chance to get CPU runtime. Either some other thread will get runtime or the thread that just yielded.

Why does this problem not occur with mutexes? When the high prio thread cannot obtain the mutex, it won't yield, it may spin a bit but will eventually be sent to sleep. A sleeping thread is not available for running until it is woken up by an event, e.g. an event like the mutex being unlocked it has been waiting for. Apple is aware of that problem and has deprecated OSSpinLock as a result. The new lock is called os_unfair_lock. This lock avoids the situation mentioned above as it is aware of the different thread priority classes. If you are sure that using spinlocks is a good idea in your iOS project, use that one. Stay away from OSSpinLock! And under no circumstances implement your own spinlocks in iOS! If in doubt, use a mutex. macOS is not affected by this issue as it has a different thread scheduler that won't allow any thread (even low prio threads) to "run dry" on CPU time, still the same situation can arise there and will then lead to very poor performance, thus OSSpinLock is deprecated on macOS as well.

c++: spin lock or mutex comparison (simple calculations)

some notes:

  1. the time shown in time utility output is the CPU time your threads are using, not actual time. Spinlock uses cpu even during waiting, while kernel mutexes will execute other threads in other processes while waiting, not billing your process for that CPU time except for the one used to actually do the scheduling (the one you see in sys row in the mutex case).

  2. for the same reason said above it may be that the total time you have to wait from the start of the process to the end is faster in the spinlock case, however your cpu may have higher usage, which is what you observed.

  3. threading may be a good choice if you have a low collision chance, that is the time a thread spends in synchronized load is small compared to the load it can do asynchronously. If all your load is protected by a mutex there is only overhead in using threads at all - you should probably serialize it.

Spinlock are good if you have both low collision chance and low waiting time in case of collision. In your case you have 8 threads colliding to the same resource, and then requiring the resource to be available just as soon as you released it. This means that in average there will be 1 thread working and other 7 spinlocking, putting the total CPU time used at 8 times the needed time for single thread (if you have an 8 core machine and there is no other load on it). In mutex case thread are paused and woken up just when the resource is available, so there is no overhead for waiting, however locking a mutex will require some overhead as work to do in the kernel to keep track of which processes are waiting for the mutex, and even if it is not too big you are doing 160 million mutex operations, which totals to the sys time billed to your process



Related Topics



Leave a reply



Submit