Why We Need Thread.Memorybarrier()

Why we need Thread.MemoryBarrier()?

You are going to have a very hard time reproducing this bug. In fact, I would go as far as saying you will never be able to reproduce it using the .NET Framework. The reason is because Microsoft's implementation uses a strong memory model for writes. That means writes are treated as if they were volatile. A volatile write has lock-release semantics which means that all prior writes must be committed before the current write.

However, the ECMA specification has a weaker memory model. So it is theoretically possible that Mono or even a future version of the .NET Framework might start exhibiting the buggy behavior.

So what I am saying is that it is very unlikely that removing barriers #1 and #2 will have any impact on the behavior of the program. That, of course, is not a guarantee, but an observation based on the current implementation of the CLR only.

Removing barriers #3 and #4 will definitely have an impact. This is actually pretty easy to reproduce. Well, not this example per se, but the following code is one of the more well known demonstrations. It has to be compiled using the Release build and ran outside of the debugger. The bug is that the program does not end. You can fix the bug by placing a call to Thread.MemoryBarrier inside the while loop or by marking stop as volatile.

class Program
{
    static bool stop = false;

    public static void Main(string[] args)
    {
        var t = new Thread(() =>
        {
            Console.WriteLine("thread begin");
            bool toggle = false;
            while (!stop)
            {
                toggle = !toggle;
            }
            Console.WriteLine("thread end");
        });
        t.Start();
        Thread.Sleep(1000);
        stop = true;
        Console.WriteLine("stop = true");
        Console.WriteLine("waiting...");
        t.Join();
    }
}

The reason why some threading bugs are hard to reproduce is because the same tactics you use to simulate thread interleaving can actually fix the bug. Thread.Sleep is the most notable example because it generates memory barriers. You can verify that by placing a call inside the while loop and observing that the bug goes away.

You can see my answer here for another analysis of the example from the book you cited.

Why do I need a memory barrier?

Barrier #2 guarentees that the write to _complete gets committed immediately. Otherwise it could remain in a queued state meaning that the read of _complete in B would not see the change caused by A even though B effectively used a volatile read.

Of course, this example does not quite do justice to the problem because A does nothing more after writing to _complete which means that the write will be comitted immediately anyway since the thread terminates early.

The answer to your question of whether the if could still evaluate to false is yes for exactly the reasons you stated. But, notice what the author says regarding this point.

Barriers 1 and 4 prevent this example
from writing “0”. Barriers 2 and 3
provide a freshness guarantee: they
ensure that if B ran after A, reading
_complete would evaluate to true.

The emphasis on "if B ran after A" is mine. It certainly could be the case that the two threads interleave. But, the author was ignoring this scenario presumably to make his point regarding how Thread.MemoryBarrier works simpler.

By the way, I had a hard time contriving an example on my machine where barriers #1 and #2 would have altered the behavior of the program. This is because the memory model regarding writes was strong in my environment. Perhaps, if I had a multiprocessor machine, was using Mono, or had some other different setup I could have demonstrated it. Of course, it was easy to demonstrate that removing barriers #3 and #4 had an impact.

Need clarification about Thread.MemoryBarrier()

Memory barrier enforces ordering constraint on reads and writes from/to memory: memory access operations before the barrier happen-before the memory access after the barrier.

Barriers 1 and 4 have complementary roles: barrier 1 ensures that the write to _answer happens-before the write to _complete, while barrier 4 ensures that the read from _complete happens-before the read from _answer. Imagine barrier 4 isn't there, but barrier 1 is. While it is guaranteed that 123 is written to _answer before true is written to _complete some other thread running B() may still have its read operations reordered and hence it may read _answer before it reads _complete. Similarly if barrier 1 is removed with barrier 4 kept: while the read from _complete in B() will always happen-before the read from _answer, _complete could still be written to before _answer by some other thread running A().
Barriers 2 and 3 provide freshness guarantee: if barrier 3 is executed after barrier 2 then the state visible to the thread running A() at the point when it executes barrier 2 becomes visible to the thread running B() at the point when it executes barrier 3. In the absence of any of these two barriers B() executing after A() completed might not see the changes made by A(). In particular barrier 2 prevents the value written to _complete from being cached by the processor running A() and forces the processor to write it out to the main memory. Similarly, barrier 3 prevents the processor running B() from relying on cache for the value of _complete forcing a read from the main memory. Note however that stale cache isn't the only thing which can prevent freshness guarantee in the absence of memory barriers 2 and 3. Reordering of operations on the memory bus is another example of such mechanism.
Memory barrier just ensures that the effects of memory access operations are ordered across the barrier. Other instructions (e.g. increment a value in a register) may still be reordered.

When to use 'volatile' or 'Thread.MemoryBarrier()' in threadsafe locking code? (C#)

You use volatile/Thread.MemoryBarrier() when you want to access a variable across threads without locking.

Variables that are atomic, like an int for example, are always read and written whole at once. That means that you will never get half of the value before another thread changes it and the other half after it has changed. Because of that you can safely read and write the value in different threads without syncronising.

However, the compiler may optimize away some reads and writes, which you prevent with the volatile keyword. If you for example have a loop like this:

sum = 0;
foreach (int value in list) {
   sum += value;
}

The compiler may actually do the calculations in a processor register and only write the value to the sum variable after the loop. If you make the sum variable volatile, the compiler will generate code that reads and writes the variable for every change, so that it's value is up to date throughout the loop.

Do memory barriers guarantee a fresh read in C#?

It is not guaranteed to see both threads write 1. It only guarantees the order of read/write operations based on this rule:

The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier execute after memory accesses that follow the call to MemoryBarrier.

So this basically means that the thread for a thread A wouldn't use a value for the variable b read before the barrier's call. But it still cache the value if your code is something like this:

void A() // runs in thread A
{
    a = 1;
    Thread.MemoryBarrier();
    // b may be cached here
    // some work here
    // b is changed by other thread
    // old value of b is being written
    Console.WriteLine(b);
}

The race-condition bugs for a the parallel execution is very hard to reproduce, so I can't provide you a code that will definitely do the scenario above, but I suggest you to use the volatile keyword for the variables being used by different threads, as it works exactly as you want - gives you a fresh read for a variable:

volatile int a = 0;
volatile int b = 0;

void A() // runs in thread A
{
    a = 1;
    Thread.MemoryBarrier();
    Console.WriteLine(b);
}

void B() // runs in thread B
{
    b = 1;
    Thread.MemoryBarrier();
    Console.WriteLine(a);
}

Is this a correct use of Thread.MemoryBarrier()?

Is this a correct use of Thread.MemoryBarrier()?

No. Suppose one thread sets the flag before the loop even begins to execute. The loop could still execute once, using a cached value of the flag. Is that correct? It certainly seems incorrect to me. I would expect that if I set the flag before the first execution of the loop, that the loop executes zero times, not once.

As far as I understand Thread.MemoryBarrier(), having this call inside the while loop will prevent my work thread from getting a cached version of the shouldRun, and effectively preventing an infinite loop from happening. Is my understanding about Thread.MemoryBarrier correct?

The memory barrier will ensure that the processor does not do any reorderings of reads and writes such that a memory access that is logically before the barrier is actually observed to be after it, and vice versa.

If you are hell bent on doing low-lock code, I would be inclined to make the field volatile rather than introducing an explicit memory barrier. "volatile" is a feature of the C# language. A dangerous and poorly understood feature, but a feature of the language. It clearly communicates to the reader of the code that the field in question is going to be used without locks on multiple threads.

is this a reasonable way to ensure that my loop will stop once shouldRun is set to false by any thread?

Some people would consider it reasonable. I would not do this in my own code without a very, very good reason.

Typically low-lock techniques are justified by performance considerations. There are two such considerations:

First, a contended lock is potentially extremely slow; it blocks as long as there is code executing in the lock. If you have a performance problem because there is too much contention then I would first try to solve the problem by eliminating the contention. Only if I could not eliminate the contention would I go to a low-lock technique.

Second, it might be that an uncontended lock is too slow. If the "work" you are doing in the loop takes, say, less that 200 nanoseconds then the time required to check the uncontended lock -- about 20 ns -- is a significant fraction of the time spent doing work. In that case I would suggest that you do more work per loop. Is it really necessary that the loop stops within 200 ns of the control flag being set?

Only in the most extreme of performance scenarios would I imagine that the cost of checking an uncontended lock is a significant fraction of the time spent in the program.

And also, of course, if you are inducing a memory barrier every 200 ns or so, you are also possibly wrecking performance in other ways. The processor wants to make those moving-memory-accesses-around-in-time optimizations for you; if you are forcing it to constantly abandon those optimizations, you're missing out on a potential win.

Explanation of Thread.MemoryBarrier() Bug with OoOP

It doesn't fix any issues. It's a fake fix, rather dangerous in production code, as it may work, or it may not work.

The core problem is in this line

static bool stop = false;

The variable that stops a while loop is not volatile. Which means it may or may not be read from memory all the time. It can be cached, so that only the last read value is presented to a system (which may not be the actual current value).

This code

// Thread.MemoryBarrier() or Console.WriteLine() fixes issue

May or may not fix an issue on different platforms. Memory barrier or console write just happen to force application to read fresh values on a particular system. It may not be the same elsewhere.

Additionally, volatile and Thread.MemoryBarrier() only provide weak guarantees, which means they don't provide 100% assurance that a read value will always be the latest on all systems and CPUs.

Eric Lippert says

The true semantics of volatile reads
and writes are considerably more complex than I've outlined here; in
fact they do not actually guarantee that every processor stops what it
is doing and updates caches to/from main memory. Rather, they provide
weaker guarantees about how memory accesses before and after reads and
writes may be observed to be ordered with respect to each other.
Certain operations such as creating a new thread, entering a lock, or
using one of the Interlocked family of methods introduce stronger
guarantees about observation of ordering. If you want more details,
read sections 3.10 and 10.5.3 of the C# 4.0 specification.

Not understanding the purpose of memory barriers in C#

Currently, I understand the problem without memory barriers is that
there's a possibility that B will run before A and B will print
nothing because _complete could be evaluated as false.

No, the problem is in compiler, jitter or CPU instruction reordering.
It can be case, when some of them could reorder

_answer = 123;
_complete = true;

instructions for some optimization as form point of view single threaded application there is no matter order of them.

Now suppose they are reordered as

_complete = true;
_answer = 123;

now:

Thread 1 set _complete = true
Thread 2 get _complete
- evaluate if condition
- get _answer (which is 0)
- Console.WriteLine(_answer) ->0
Thread 1 set _answer = 123

The code logic broken.

VB.NET: Do I need to call Thread.MemoryBarrier() before each read if I always complete my writes with Thread.MemoryBarrier()?

You can't remove the barrier on the read-side which is easy to show by example. Let's use this reader:

while (!IsDisposed); //reads _isDisposed

The value of _isDisposed can clearly be cached in a register here so that new writes will never become visible. This loop could be infinite (for example - other effects are possible such as long delays).

More formally, the reads of _isDisposed can all move "upwards" in time to appear to run before the store happens. volatile stores effect a release fence meaning that nothing can move over them later in time. Things can move over them to previous points in time, though.

Use the Volatile class. Or, use a struct written in C# as a wrapper around the field:

struct VolatileInt32Box { public volatile int Value; }

Thread.MemoryBarrier and lock difference for a simple property

is there any difference regarding thread-safeness?

Both ensure that appropriate barriers are set up around the read and write.

result?

In both cases two threads can race to write a value. However, reads and writes cannot move forwards or backwards in time past either the lock or the full fences.

performance?

You've written the code both ways. Now run it. If you want to know which is faster, run it and find out! If you have two horses and you want to know which is faster, race them. Don't ask strangers on the Internet which horse they think is faster.

That said, a better technique is set a performance goal, write the code to be clearly correct, and then test to see if you met your goal. If you did, don't waste your valuable time trying to optimize further code that is already fast enough; spend it optimizing something else that isn't fast enough.

A question you didn't ask:

What would you do?

I'd not write a multithreaded program, that's what I'd do. I'd use processes as my unit of concurrency if I had to.

If I had to write a multithreaded program then I would use the highest-level tool available. I'd use the Task Parallel Library, I'd use async-await, I'd use Lazy<T> and so on. I'd avoid shared memory; I'd treat threads as lightweight processes that returned a value asynchronously.

If I had to write a shared-memory multithreaded program then I would lock everything, all the time. We routinely write programs these days that fetch a billion bytes of video over a satellite link and send it to a phone. Twenty nanoseconds spent taking a lock isn't going to kill you.

I am not smart enough to try to write low-lock code, so I wouldn't do that at all. If I had to then I would use that low-lock code to build a higher-level abstraction and use that abstraction. Fortunately I don't have to because someone already has built the abstractions I need.

Why We Need Thread.Memorybarrier()