Is Volatile Bool for Thread Control Considered Wrong

Is volatile bool for thread control considered wrong?

volatile can be used for such purposes. However this is an extension to standard C++ by Microsoft:

Microsoft Specific

Objects declared as volatile are (...)

A write to a volatile object (volatile write) has Release semantics; (...)

A read of a volatile object (volatile read) has Acquire semantics; (...)

This allows volatile objects to be used for memory locks and releases in multithreaded applications.^{(emph. added)}

That is, as far as I understand, when you use the Visual C++ compiler, a volatile bool is for most practical purposes an atomic<bool>.

It should be noted that newer VS versions add a /volatile switch that controls this behavior, so this only holds if /volatile:ms is active.

Why is volatile not considered useful in multithreaded C or C++ programming?

The problem with volatile in a multithreaded context is that it doesn't provide all the guarantees we need. It does have a few properties we need, but not all of them, so we can't rely on volatile alone.

However, the primitives we'd have to use for the remaining properties also provide the ones that volatile does, so it is effectively unnecessary.

For thread-safe accesses to shared data, we need a guarantee that:

the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)
that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

volatile does guarantee the first point. It also guarantees that no reordering occurs between different volatile reads/writes. All volatile memory accesses will occur in the order in which they're specified. That is all we need for what volatile is intended for: manipulating I/O registers or memory-mapped hardware, but it doesn't help us in multithreaded code where the volatile object is often only used to synchronize access to non-volatile data. Those accesses can still be reordered relative to the volatile ones.

The solution to preventing reordering is to use a memory barrier, which indicates both to the compiler and the CPU that no memory access may be reordered across this point. Placing such barriers around our volatile variable access ensures that even non-volatile accesses won't be reordered across the volatile one, allowing us to write thread-safe code.

However, memory barriers also ensure that all pending reads/writes are executed when the barrier is reached, so it effectively gives us everything we need by itself, making volatile unnecessary. We can just remove the volatile qualifier entirely.

Since C++11, atomic variables (std::atomic<T>) give us all of the relevant guarantees.

Volatile boolean vs AtomicBoolean

They are just totally different. Consider this example of a volatile integer:

volatile int i = 0;
void incIBy5() {
    i += 5;
}

If two threads call the function concurrently, i might be 5 afterwards, since the compiled code will be somewhat similar to this (except you cannot synchronize on int):

void incIBy5() {
    int temp;
    synchronized(i) { temp = i }
    synchronized(i) { i = temp + 5 }
}

If a variable is volatile, every atomic access to it is synchronized, but it is not always obvious what actually qualifies as an atomic access. With an Atomic* object, it is guaranteed that every method is "atomic".

Thus, if you use an AtomicInteger and getAndAdd(int delta), you can be sure that the result will be 10. In the same way, if two threads both negate a boolean variable concurrently, with an AtomicBoolean you can be sure it has the original value afterwards, with a volatile boolean, you can't.

So whenever you have more than one thread modifying a field, you need to make it atomic or use explicit synchronization.

The purpose of volatile is a different one. Consider this example

volatile boolean stop = false;
void loop() {
    while (!stop) { ... }
}
void stop() { stop = true; }

If you have a thread running loop() and another thread calling stop(), you might run into an infinite loop if you omit volatile, since the first thread might cache the value of stop. Here, the volatile serves as a hint to the compiler to be a bit more careful with optimizations.

Is 'volatile' needed in this multi-threaded C++ code?

You should not depend on volatile to guarantee thread safety, this is because even though the compiler will guarantee that the the variable is always read from memory (and not a register cache), in multi-processor environments a memory barrier will also be required.

Rather use the correct lock around the shared memory. Locks like a Critical Section are often extremely lightweight and in a case of no contention will probably be all implemented userside. They will also contain the necessary memory barriers.

Volatile should only be used for memory mapped IO where multiple reads may return different values. Similarly for memory mapped writes.

Is a volatile boolean switch written to by only one thread thread-safe?

However this thread is the only thread that ever writes to the variable... Is the variable being volatile enough to make sure that all threads read the right value from it.

If there is only one writer then there is no race condition. You do not need to synchronize across the variable.

You might consider using an AtomicBoolean but it does not support a toggle() method so if you had multiple writers toggling the value, you would have to do something like the following:

private final AtomicBoolean isEvenTick = new AtomicBoolean();
...
boolean currentValue;
do {
  currentValue = isEvenTick.get();
} while (!isEvenTick.compareAndSet(currentValue, !currentValue);

What is the volatile keyword useful for?

volatile has semantics for memory visibility. Basically, the value of a volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it. Without volatile, readers could see some non-updated value.

To answer your question: Yes, I use a volatile variable to control whether some code continues a loop. The loop tests the volatile value and continues if it is true. The condition can be set to false by calling a "stop" method. The loop sees false and terminates when it tests the value after the stop method completes execution.

The book "Java Concurrency in Practice," which I highly recommend, gives a good explanation of volatile. This book is written by the same person who wrote the IBM article that is referenced in the question (in fact, he cites his book at the bottom of that article). My use of volatile is what his article calls the "pattern 1 status flag."

If you want to learn more about how volatile works under the hood, read up on the Java memory model. If you want to go beyond that level, check out a good computer architecture book like Hennessy & Patterson and read about cache coherence and cache consistency.

Why is std::atomicbool much slower than volatile bool?

Code from "Olaf Dietsche"

 USE ATOMIC
 real   0m1.958s
 user   0m1.957s
 sys    0m0.000s

 USE VOLATILE
 real   0m1.966s
 user   0m1.953s
 sys    0m0.010s

IF YOU ARE USING GCC SMALLER 4.7

http://gcc.gnu.org/gcc-4.7/changes.html

Support for atomic operations specifying the C++11/C11 memory model has been added. These new __atomic routines replace the existing __sync built-in routines.

Atomic support is also available for memory blocks. Lock-free instructions will be used if a memory block is the same size and alignment as a supported integer type. Atomic operations which do not have lock-free support are left as function calls. A set of library functions is available on the GCC atomic wiki in the "External Atomics Library" section.

So yeah .. only solution is to upgrade to GCC 4.7

Volatile and CreateThread

What volatile does:

Prevents the compiler from optimizing out any access. Every read/write will result in a read/write instruction.
Prevents the compiler from reordering the access with other volatiles.

What volatile does not:

Make the access atomic.
Prevent the compiler from reordering with non-volatile accesses.
Make changes from one thread visible in another thread.

Some non-portable behaviors that shouldn't be relied on in cross-platform C++:

VC++ has extended volatile to prevent any reordering with other instructions. Other compilers don't, because it negatively affects optimization.
x86 makes aligned read/write of pointer-sized and smaller variables atomic, and immediately visible to other threads. Other architectures don't.

Most of the time, what people really want are fences (also called barriers) and atomic instructions, which are usable if you've got a C++11 compiler, or via compiler- and architecture-dependent functions otherwise.

Fences ensure that, at the point of use, all the previous reads/writes will be completed. In C++11, fences are controlled at various points using the std::memory_order enumeration. In VC++ you can use _ReadBarrier(), _WriteBarrier(), and _ReadWriteBarrier() to do this. I'm not sure about other compilers.

On some architectures like x86, a fence is merely a way to prevent the compiler from reordering instructions. On others they might actually emit an instruction to prevent the CPU itself from reordering things.

Here's an example of improper use:

int res1, res2;
volatile bool finished;

void work_thread(int a, int b)
{
    res1 = a + b;
    res2 = a - b;
    finished = true;
}

void spinning_thread()
{
    while(!finished); // spin wait for res to be set.
}

Here, finished is allowed to be reordered to before either res is set! Well, volatile prevents reordering with other volatile, right? Let's try making each res volatile too:

volatile int res1, res2;
volatile bool finished;

void work_thread(int a, int b)
{
    res1 = a + b;
    res2 = a - b;
    finished = true;
}

void spinning_thread()
{
    while(!finished); // spin wait for res to be set.
}

This trivial example will actually work on x86, but it is going to be inefficient. For one, this forces res1 to be set before res2, even though we don't really care about that... we just want both of them set before finished is. Forcing this ordering between res1 and res2 will only prevent valid optimizations, eating away at performance.

For more complex problems, you'll have to make every write volatile. This would bloat your code, be very error prone, and become slow as it prevents a lot more reordering than you really wanted.

It's not realistic. So we use fences and atomics. They allow full optimization, and only guarantee that the memory access will complete at the point of the fence:

int res1, res2;
std::atomic<bool> finished;

void work_thread(int a, int b)
{
    res1 = a + b;
    res2 = a - b;
    finished.store(true, std::memory_order_release);
}

void spinning_thread()
{
    while(!finished.load(std::memory_order_acquire));
}

This will work for all architectures. res1 and res2 operations can be reordered as the compiler sees fit. Performing an atomic release ensures that all non-atomic ops are ordered to complete and be visible to threads which perform an atomic acquire.

Is Volatile Bool for Thread Control Considered Wrong