Illustrating Usage of the Volatile Keyword in C#

Illustrating usage of the volatile keyword in C#

I've achieved a working example!

The main idea received from wiki, but with some changes for C#. The wiki article demonstrates this for static field of C++, it is looks like C# always carefully compile requests to static fields... and i make example with non static one:

If you run this example in Release mode and without debugger (i.e. using Ctrl+F5) then the line while (test.foo != 255) will be optimized to 'while(true)' and this program never returns.
But after adding volatile keyword, you always get 'OK'.

class Test
{
    /*volatile*/ int foo;

    static void Main()
    {
        var test = new Test();

        new Thread(delegate() { Thread.Sleep(500); test.foo = 255; }).Start();

        while (test.foo != 255) ;
        Console.WriteLine("OK");
    }
}

Are volatile variables useful? If yes then when?

This question is very confusing. Let me try to break it down.

Are volatile variables useful?

Yes. The C# team would not have added a useless feature.

If yes then when?

Volatile variables are useful in certain highly performance-sensitive multithreaded applications where the application architecture is predicated on sharing memory across threads.

As an editorial aside, I note that it should be rare for normal line-of-business C# programmers to be in any of these situations. First, the performance characteristics we are talking about here are on the order of tens of nanoseconds; most LOB applications have performance requirements measured in seconds or minutes, not in nanoseconds. Second, most LOB C# applications can do their work with only a small number of threads. Third, shared memory is a bad idea and a cause of many bugs; LOB applications which use worker threads should not use threads directly, but rather use the Task Parallel Library to safely instruct worker threads to perform calculations, and then return the results. Consider using the new await keyword in C# 5.0 to facilitate task-based asynchrony, rather than using threads directly.

Any use of volatile in a LOB application is a big red flag and should be heavily reviewed by experts, and ideally eliminated in favour of a higher-level, less dangerous practice.

lock will prevent instruction reordering.

A lock is described by the C# specification as being a special point in the code such that certain special side effects are guaranteed to be ordered in a particular way with respect to entering and leaving the lock.

volatile because will force CPU to always read value from memory (then different CPUs/cores won't cache it and they won't see old values).

What you are describing is implementation details for how volatile could be implemented; there is not a requirement that volatile be implemented by abandoning caches and going back to main memory. The requirements of volatile are spelled out in the specification.

Interlocked operations perform change + assignment in a single atomic (fast) operation.

It is not clear to me why you have parenthesized "fast" after "atomic"; "fast" is not a synonym for "atomic".

How lock will prevent cache problem?

Again: lock is documented as being a special event in the code; a compiler is required to ensure that other special events have a particular order with respect to the lock. How the compiler chooses to implement those semantics is an implementation detail.

Is it implicit a memory barrier in a critical section?

In practice yes, a lock introduces a full fence.

Volatile variables can't be local

Correct. If you are accessing a local from two threads then the local must be a special local: it could be a closed-over outer variable of a delegate, or in an async block, or in an iterator block. In all cases the local is actually realized as a field. If you want such a thing to be volatile then do not use high-level features like anonymous methods, async blocks or iterator blocks! That is mixing the highest level and the lowest level of C# coding and that is a very strange thing to do. Write your own closure class and make the fields volatile as you see fit.

I read something from Eric Lippert about this but I can't find that post now and I don't remember his answer.

Well I don't remember it either, so I typed "Eric Lippert Why can't a local variable be volatile" into a search engine. That took me to this question:

why can't a local variable be volatile in C#?

Perhaps that is what you're thinking of.

This makes me think they're not implemented with an Interlocked.CompareExchange() and friends.

C# implements volatile fields as volatile fields. Volatile fields are a fundamental concept in the CLR; how the CLR implements them is an implementation detail of the CLR.

in what they're different?

I don't understand the question.

What volatile modifier will do for example in this code?

++_volatileField;

It does nothing helpful, so don't do that. Volatility and atomicity are completely different things. Doing a normal increment on a volatile field does not make the increment into an atomic increment.

Moreover what compiler (beside warnings) will do here:

The C# compiler really ought to suppress that warning if the method being called introduces a fence, as this one does. I never managed to get that into the compiler. Hopefully the team will someday.

The volatile field will be updated in an atomic manner. A fence will be introduced by the increment, so the fact that the volatile half-fences are skipped is mitigated.

How is it possible for non volatile fields?

That's an implementation detail of the CLR.

Does they imply barriers too?

Yes, the interlocked operations introduce barriers. Again, this is an implementation detail.

Doesn't this hurt performance a lot (compared to volatile)?

First off, comparing the performance of broken code to working code is a waste of time.

Second, if you do feel like wasting time, you are perfectly capable of measuring the performance of each yourself. Write the code both ways, get out a stopwatch, run it a trillion times each way, and you'll know which is faster.

If volatile doesn't imply barriers but others do then why we can't use them as on local variables?

I can't even begin to make sense of this question.

A reproducible example of volatile usage

The exact semantics of volatile is a jitter implementation detail. The compiler emits the Opcodes.Volatile IL instruction where ever you access a variable that's declared volatile. It does some checking to verify that the variable type is legal, you can't declare value types larger than 4 bytes volatile but that's where the buck stops.

The C# language specification defines the behavior of volatile, quoted here by Eric Lippert. The 'release' and 'acquire' semantics is something that only makes sense on a processor core with a weak memory model. Those kind of processors have not done well in the market, probably because they are such an enormous pain to program. The odds that your code will ever run on a Titanium are slim to none.

What's especially bad about the C# language specification definition is that it doesn't mention at all what really happens. Declaring a variable volatile prevents the jitter optimizer from optimizing the code to store the variable in a cpu register. Which is why the code that Marc linked is hanging. This will only happen with the current x86 jitter, another strong hint that volatile is really a jitter implementation detail.

The poor semantics of volatile has a rich history, it comes from the C language. Whose code generators have lots of trouble getting it right as well. Here's a interesting report about it (pdf). It dates from 2008, a good 30+ years of opportunity to get it right. Or wrong, this goes belly-up when the code optimizer is forgetting about a variable being volatile. Unoptimized code never has a problem with it. Notable is that the jitter in the 'open source' version of .NET (SSLI20) completely ignores the IL instruction. It can also be argued that the current behavior of the x86 jitter is a bug. I think it is, it is not easy to bump it into the failure mode. But nobody can argue that it actually is a bug.

The writing is on the wall, only ever declare a variable volatile if it is stored in a memory mapped register. The original intention of the keyword. The odds that you'll run into such a usage in the C# language should be vanishingly small, code like that belongs in a device driver. And above all, never assume that it is useful in a multi-threading scenario.

C# compiler optimization and volatile keyword

So, as far as I understand, the following should never exit.

No, it can stop. It just isn't guaranteed to.

It doesn't stop on the machine I'm currently running on, for example - but equally I could try the exact same executable on another machine and it might behave fine. It will depend on the exact memory model semantics used by the CLR it runs on. That will be affected by the underlying architecture and potentially even the exact CPU being used.

It's important to note that it's not the C# compiler which determines what to do with a volatile field - the C# compiler just indicates the volatility in the metadata using System.Runtime.CompilerServices.IsVolatile. Then the JIT can work out what that means in terms of obeying the relevant contracts.

Is the 'volatile' keyword still broken in C#?

Volatile in its current implementation is not broken despite popular blog posts claiming such a thing. It is however badly specified and the idea of using a modifier on a field to specify memory ordering is not that great (compare volatile in Java/C# to C++'s atomic specification that had enough time to learn from the earlier mistakes). The MSDN article on the other hand was clearly written by someone who has no business talking about concurrency and is completely bogus.. the only sane option is to completely ignore it.

Volatile guarantees acquire/release semantics when accessing the field and can only be applied to types that allow atomic reads and writes. Not more, not less. This is enough to be useful to implement many lock-free algorithms efficiently such as non-blocking hashmaps.

One very simple sample is using a volatile variable to publish data. Thanks to the volatile on x, the assertion in the following snippet cannot fire:

private int a;
private volatile bool x;

public void Publish()
{
    a = 1;
    x = true;
}

public void Read()
{
    if (x)
    {
        // if we observe x == true, we will always see the preceding write to a
        Debug.Assert(a == 1); 
    }
}

Volatile is not easy to use and in most situations you are much better off to go with some higher level concept, but when performance is important or you're implementing some low level data structures, volatile can be exceedingly useful.

Volatile Violates its main job?

The MSDN documentation is wrong. That is most certainly not what volatile does. The C# specification tells you exactly what volatile does and getting a "fresh read" or a "committed write" is not one of them. The specification is correct. volatile only guarantees acquire-fences on reads and release-fences on writes. These are defined as below.

acquire-fence: A memory barrier in which other reads and writes are not allowed to move before the fence.
release-fence: A memory barrier in which other reads and writes are not allowed to move after the fence.

I will try to explain the table using my arrow notation. A ↓ arrow will mark a volatile read and a ↑ arrow will mark a volatile write. No instruction can move through the arrowhead. Think of the arrowhead as pushing everything away.

In the following analysis I will use to variables; x and y. I will also assume that they are marked as volatile.

Case #1

Notice how the placement of the arrow after the read of x prevents the read of y from moving up. Also notice that the volatility of y is irrelevant in this case.

var localx = x;
↓
var localy = y;
↓

Case #2

Notice how the placement of the arrow after the read of x prevents the write to y from moving up. Also notice that the volatility of either of x or y, but not both, could have been omitted in this case.

var localx = x;
↓
↑
y = 1;

Case #3

Notice how the placement of the arrow before the write to y prevents the write to x from moving down. Notice that the volatility of x is irrelevant in this case.

↑
x = 1;
↑
y = 2;

Case #4

Notice that there is no barrier between the write to x and the read of y. Because of this the either the write to x can float down or the read of y can float up. Either movement is valid. This is why the instructions in the write-read case can be swapped.

↑
x = 1;
var localy = y;
↓

Notable Mentions

It is also important to note that:

x86 hardware has volatile semantics on writes.
Microsoft's implementation of the CLI (and suspect Mono's as well) has volatile semantics on writes.
The ECMA specification does not have volatile semantics on writes.

When to use volatile to counteract compiler optimizations in C#

What are the rules the complier follows in order to determine when to
implicity perform a volatile read?

First, it is not just the compiler that moves instructions around. The big 3 actors in play that cause instruction reordering are:

Compiler (like C# or VB.NET)
Runtime (like the CLR or Mono)
Hardware (like x86 or ARM)

The rules at the hardware level are a little more cut and dry in that they are usually documented pretty well. But, at the runtime and compiler levels there are memory model specifications that provide constraints on how instructions can get reordered, but it is left up to the implementers to decide how aggressively they want to optimize the code and how closely they want to toe the line with respect to the memory model constraints.

For example, the ECMA specification for the CLI provides fairly weak guarantees. But Microsoft decided to tighten those guarantees in the .NET Framework CLR. Other than a few blog posts I have not seen much formal documentation on the rules the CLR adheres to. Mono, of course, might use a different set of rules that may or may not bring it closer to the ECMA specification. And of course, there may be some liberty in changing the rules in future releases as long as the formal ECMA specification is still considered.

With all of that said I have a few observations:

Compiling with the Release configuration is more likely to cause instruction reordering.
Simpler methods are more likely to have their instructions reordered.
Hoisting a read from inside a loop to outside of the loop is a typical type of reordering optimization.

And why can I still get the loop to exit with what I consider to be
odd measures?

It is because those "odd measures" are doing one of two things:

generating an implicit memory barrier
circumventing the compiler's or runtime's ability to perform certain optimizations

For example, if the code inside a method gets too complex it may prevent the JIT compiler from performing certain optimizations that reorders instructions. You can think of it as sort of like how complex methods also do not get inlined.

Also, things like Thread.Yield and Thread.Sleep create implicit memory barriers. I have started a list of such mechanisms here. I bet if you put a Console.WriteLine call in your code it would also cause the loop to exit. I have also seen the "non terminating loop" example behave differently in different versions of the .NET Framework. For example, I bet if you ran that code in 1.0 it would terminate.

This is why using Thread.Sleep to simulate thread interleaving could actually mask a memory barrier problem.

Update:

After reading through some of your comments I think you may be confused as to what Thread.MemoryBarrier is actually doing. What it is does is it creates a full-fence barrier. What does that mean exactly? A full-fence barrier is the composition of two half-fences: an acquire-fence and a release-fence. I will define them now.

Acquire fence: A memory barrier in which other reads & writes are not allowed to move before the fence.
Release fence: A memory barrier in which other reads & writes are not allowed to move after the fence.

So when you see a call to Thread.MemoryBarrier it will prevent all reads & writes from being moved either above or below the barrier. It will also emit whatever CPU specific instructions are required.

If you look at the code for Thread.VolatileRead here is what you will see.

public static int VolatileRead(ref int address)
{
    int num = address;
    MemoryBarrier();
    return num;
}

Now you may be wondering why the MemoryBarrier call is after the actual read. Your intuition may tell you that to get a "fresh" read of address you would need the call to MemoryBarrier to occur before that read. But, alas, your intuition is wrong! The specification says a volatile read should produce an acquire-fence barrier. And per the definition I gave you above that means the call to MemoryBarrier has to be after the read of address to prevent other reads and writes from being moved before it. You see volatile reads are not strictly about getting a "fresh" read. It is about preventing the movement of instructions. This is incredibly confusing; I know.

Illustrating Usage of the Volatile Keyword in C#