Understanding Std::Atomic::Compare_Exchange_Weak() in C++11

Understanding std::atomic::compare_exchange_weak() in C++11

I'm trying to answer this myself, after going through various online resources (e.g., this one and this one), the C++11 Standard, as well as the answers given here.

The related questions are merged (e.g., "why !expected ?" is merged with "why put compare_exchange_weak() in a loop ?") and answers are given accordingly.

Why does compare_exchange_weak() have to be in a loop in nearly all uses?

Typical Pattern A

You need achieve an atomic update based on the value in the atomic variable. A failure indicates that the variable is not updated with our desired value and we want to retry it. Note that we don't really care about whether it fails due to concurrent write or spurious failure. But we do care that it is us that make this change.

expected = current.load();
do desired = function(expected);
while (!current.compare_exchange_weak(expected, desired));

A real-world example is for several threads to add an element to a singly linked list concurrently. Each thread first loads the head pointer, allocates a new node and appends the head to this new node. Finally, it tries to swap the new node with the head.

Another example is to implement mutex using std::atomic<bool>. At most one thread can enter the critical section at a time, depending on which thread first set current to true and exit the loop.

Typical Pattern B

This is actually the pattern mentioned in Anthony's book. In contrary to pattern A, you want the atomic variable to be updated once, but you don't care who does it. As long as it's not updated, you try it again. This is typically used with boolean variables. E.g., you need implement a trigger for a state machine to move on. Which thread pulls the trigger is regardless.

expected = false;
// !expected: if expected is set to true by another thread, it's done!
// Otherwise, it fails spuriously and we should try again.
while (!current.compare_exchange_weak(expected, true) && !expected);

Note that we generally cannot use this pattern to implement a mutex. Otherwise, multiple threads may be inside the critical section at the same time.

That said, it should be rare to use compare_exchange_weak() outside a loop. On the contrary, there are cases that the strong version is in use. E.g.,

bool criticalSection_tryEnter(lock)
{
  bool flag = false;
  return lock.compare_exchange_strong(flag, true);
}

compare_exchange_weak is not proper here because when it returns due to spurious failure, it's likely that no one occupies the critical section yet.

Starving Thread?

One point worth mentioning is that what happens if spurious failures continue to happen thus starving the thread? Theoretically it could happen on platforms when compare_exchange_XXX() is implement as a sequence of instructions (e.g., LL/SC). Frequent access of the same cache line between LL and SC will produce continuous spurious failures. A more realistic example is due to a dumb scheduling where all concurrent threads are interleaved in the following way.

Time
 |  thread 1 (LL)
 |  thread 2 (LL)
 |  thread 1 (compare, SC), fails spuriously due to thread 2's LL
 |  thread 1 (LL)
 |  thread 2 (compare, SC), fails spuriously due to thread 1's LL
 |  thread 2 (LL)
 v  ..

Can it happen?

It won't happen forever, fortunately, thanks to what C++11 requires:

Implementations should ensure that weak compare-and-exchange
operations do not consistently return false unless either the atomic
object has value different from expected or there are concurrent
modifications to the atomic object.

Why do we bother use compare_exchange_weak() and write the loop ourselves? We can just use compare_exchange_strong().

It depends.

Case 1: When both need to be used inside a loop. C++11 says:

When a compare-and-exchange is in a loop, the weak version will yield
better performance on some platforms.

On x86 (at least currently. Maybe it'll resort to a similiar scheme as LL/SC one day for performance when more cores are introduced), the weak and strong version are essentially the same because they both boil down to the single instruction cmpxchg. On some other platforms where compare_exchange_XXX() isn't implemented atomically (here meaning no single hardware primitive exists), the weak version inside the loop may win the battle because the strong one will have to handle the spurious failures and retry accordingly.

But,

rarely, we may prefer compare_exchange_strong() over compare_exchange_weak() even in a loop. E.g., when there is a lot of things to do between atomic variable is loaded and a calculated new value is exchanged out (see function() above). If the atomic variable itself doesn't change frequently, we don't need repeat the costly calculation for every spurious failure. Instead, we may hope that compare_exchange_strong() "absorb" such failures and we only repeat calculation when it fails due to a real value change.

Case 2: When only compare_exchange_weak() need to be used inside a loop. C++11 also says:

When a weak compare-and-exchange would require a loop and a strong one
would not, the strong one is preferable.

This is typically the case when you loop just to eliminate spurious failures from the weak version. You retry until exchange is either successful or failed because of concurrent write.

expected = false;
// !expected: if it fails spuriously, we should try again.
while (!current.compare_exchange_weak(expected, true) && !expected);

At best, it's reinventing the wheels and perform the same as compare_exchange_strong(). Worse? This approach fails to take full advantage of machines that provide non-spurious compare-and-exchange in hardware.

Last, if you loop for other things (e.g., see "Typical Pattern A" above), then there is a good chance that compare_exchange_strong() shall also be put in a loop, which brings us back to the previous case.

C++ Atomic compare_exchange_weak

It is called an exchange for a reason.

Calling example.compare_exchange_weak(expected, value); roughly corresponds to the following piece of code done atomically:

if (expected==example){
    example = value;
    return true;
}
else
{
    expected = example;
    return false;
}

In your first call, the else branch is taken meaning expected<-100 as 100 is the present value in example which is not altered.

The next call compares example to expected and they are obviously equal, so the if branch is taken. Thus example==100 after the second call.

See the documentation for more detailed information.

This behavior is useful because you can always save the expected value somewhere safe and use a "burner" variable and this way you also get the currently stored value in example. Another way of looking at this: after the call expected will always contain the value stored in example before the call was made.

Understanding compare_exchange_weak

Basically you can do whatever you like inside the loop. compare_exchange (both, weak and strong) only ever succeed if and only if the value of the variable to be updated equals the first provided argument. In your example, list_head.compare_exchange_weak(oldHead,newNode) can only ever succeed iff list_head == oldHead. So if inside the loop you have a sleep or some long calculation, then there is a good chance that some other thread is faster and updates list_head, so your compare_exchange operation is likely to fail. It is important to note that the first parameter is passed by reference. If the compare fails, the given parameter is updated with the new value.

Storing oldHead in newNode->next is perfectly safe, because if subsequent the compare_exchange fails, oldHead is updated, the new value is again stored in newNode->next and we try again (and again, and again, ...) until the compare_exchange eventually succeeds. When that is the case, it is guaranteed that the value stored in newNode->next matches the one we just replaced in list_head - voilà, we have successfully inserted the node in the list.

  // what follows is equivalent to:
  //   newNode->next = list_head;
  //   list_head = newNode;
  // but in a thread-safe way:
  while (!list_head.compare_exchange_weak(oldHead,newNode)) {
    // if we are here, the compare_exchange_weak has failed, but
    // oldHead has been updated with the current value from list_head
    newNode->next = oldHead;
  }
  // if we are here, the compare_exchange_weak was successful, i.e., the value
  // in list_head (oldHead) did not change between our assignment to newNode->next
  // and the subsequent (successful) compare_exchange.

The difference the weak and strong version is that compare_exchange_weak may fail spuriously. That is, it may even fail in cases where the comparison yields true. The reason why there why there is weak version is that on architectures that implement compare-exchange using LL/SC, a compare_exchange_weak is cheaper than the strong version. In general, if you have a retry loop like in your example, you should prefer compare_exchange_weak, because if you really have a (rare) spurious failure, we simply perform another retry.

Don't really get the logic of std::atomic::compare_exchange_weak and compare_exchange_strong

I will give an example where I did used that, since it is very simple one.

I had atomic which describes available size of something.
There was a danger of integer overflow, so I had do checking first before subtracting a value.

Not exact copy paste of my production code:

class LockFreeCuncuretSizeLimit {
public:
    explicit LockFreeCuncuretSizeLimit(size_t available) : mSize{available}
    {}

    bool tryAcuqire(size_t s) {
        size_t lastSize = mSize;
        size_t newSize;
        do
        {
            if (lastSize >= s)
            {
                newSize = lastSize - s;
            }
            else
            {
                return false;
            }

        }
        while (!mSize.compare_exchange_weak(lastSize, newSize));
        return true;
    }

    void release(size_t s) {
        mSize += s; // here I do not have worry about integer overflow
    }
private:
    std::atomic<size_t> mSize;
};

Now try image do that without compare_exchange_strong and not having a race condition.

There is a change that condition is meet, but when subtraction is done on atomic some other thread already subtracted a value so when I do actual subtraction it is possible to overflow integer. So this cant be done without compare_exchange_strong.

Now difference between compare_exchange_strong and compare_exchange_weak is hard to explain. Even Herb Sutter on some cppcon talk give up explaining that and provided simple rule: "if you need loop use compare_exchange_weak otherwise use compare_exchange_strong".

c++11 std::atomic compare_exchange_weak and the stack container

Apart from my comment, if compare_exchange_weak succeeded, it managed to update head from the value old_head to the value old_head->next and therefore old_head is no longer part of the list and so can be correctly returned.

Its only if it returns false that it modifies the value of old_head

EDIT: Showing issues with the above code.

Firstly, if 2 threads get into pop and both read head (via head.load()) and get the same value of old_head

Thread one gets swapped out (say)

Thread two carries on running, pops the head and returns to caller, and the caller then deletes the value which holds the node.

Thread one then gets resumed and tries to read old_head->next to even call compare_exchange_weak.

HOWEVER, old_head points to memory which has been deleted. Undefined behavoir, you're lucky if you segfault.

Secondly, this has the classic ABA problem. I wont bother to explain that as its well documented and understood. Search for it.

C++11: does atomic::compare_exchange_weak support none-primitive types?

Does std::atomic::compare_exchange_weak work with complex structures?

Yes, but there are conditions that include trivially copyable and trivially constructible.

If intel cpu hardware CMPEXG only works within 64 bit length cache line, does structures larger than 8 bytes work with CMPEXG?

No. It doesn't work that way. If you create crazily big structures like the one you have there, your code will not be "lockfree". Your compiler will issue bus-locks to ensure thread-safety, which is why you should never, ever do what you're doing up there with big data strucutres. You'll slow down your program by hundreds of times, if not more. Consider atomically exchanging pointers.

Is it still atomic operation?

No, it uses locks. You can test this with std::atomic::is_lock_free()

How to fix my program?

There you go:

#include <atomic>
#include <iostream>

using namespace std;

struct Big {
  int i;
  int j;
  int k[100];
};

int main() {
  Big value, expect;
  atomic<Big> ab;
  ab.compare_exchange_weak(expect, value);
  return 0;
}

std::atomic | compare_exchange_weak vs. compare_exchange_strong

It has to do with the shared memory consistency model that the hardware implements. For those hardware architectures that implement some kind of relaxed consistency model (e.g. release semantics), the strong operations you refer to above can have a high overhead, and thus experts can use the weaker forms to implement algorithms that perform well also on those relaxed consistency architectures.

For more info, see e.g.

http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-7.pdf

Chapter 12 and Appendix C in http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

c++11: how to produce spurious failures upon compare_exchange_weak?

You'll need a non-x86 CPU, typically one that uses load-linked / store-conditional like ARM or AArch64 (before ARMv8.1). x86 lock cmpxchg implements CAS_strong, and there's nothing you can do to weaken it and make it possibly fail when the value in memory does match.

CAS_weak can compile to a single LL/SC attempt; CAS_strong has to compile to an LL/SC retry loop to only fail if the compare was false, not merely from competition for the cache line from other threads. (As on Godbolt, for ARMv8 ldxr/stxr vs. ARMv8.1 cas)

Have one thread use CAS_weak to do 100 million increments of a std::atomic variable. (https://preshing.com/20150402/you-can-do-any-kind-of-atomic-read-modify-write-operation/ explains how to use a CAS retry loop to implement any arbitrary atomic RMW operation on one variable, but since this is the only thread writing, CAS_strong will always succeed and we don't need a retry loop. My Godbolt example shows doing a single increment attempt using CAS_weak vs. CAS_strong.)

To cause spurious failures for CAS_weak but not CAS_strong, have another thread reading the same variable, or writing something else in the same cache line.

Or maybe doing something like var.fetch_or(0, std::memory_order_relaxed) to truly get ownership of the cache line, but in a way that wouldn't affect the value. That will definitely cause spurious LL/SC failures.

When you're done, 100M CAS_strong attempts will have incremented the variable from 0 to 100000000. But 100M CAS_weak attempts will have incremented to some lower value, with the difference being the number of failures.

Understanding Std::Atomic::Compare_Exchange_Weak() in C++11