Why Is Volatile Not Considered Useful in Multithreaded C or C++ Programming

Why is volatile not considered useful in multithreaded C or C++ programming?

The problem with volatile in a multithreaded context is that it doesn't provide all the guarantees we need. It does have a few properties we need, but not all of them, so we can't rely on volatile alone.

However, the primitives we'd have to use for the remaining properties also provide the ones that volatile does, so it is effectively unnecessary.

For thread-safe accesses to shared data, we need a guarantee that:

the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)
that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

volatile does guarantee the first point. It also guarantees that no reordering occurs between different volatile reads/writes. All volatile memory accesses will occur in the order in which they're specified. That is all we need for what volatile is intended for: manipulating I/O registers or memory-mapped hardware, but it doesn't help us in multithreaded code where the volatile object is often only used to synchronize access to non-volatile data. Those accesses can still be reordered relative to the volatile ones.

The solution to preventing reordering is to use a memory barrier, which indicates both to the compiler and the CPU that no memory access may be reordered across this point. Placing such barriers around our volatile variable access ensures that even non-volatile accesses won't be reordered across the volatile one, allowing us to write thread-safe code.

However, memory barriers also ensure that all pending reads/writes are executed when the barrier is reached, so it effectively gives us everything we need by itself, making volatile unnecessary. We can just remove the volatile qualifier entirely.

Since C++11, atomic variables (std::atomic<T>) give us all of the relevant guarantees.

Does volatile guarantee anything at all in portable C code for multi-core systems?

To summarize the problem, it appears (after reading a lot) that
"volatile" guarantees something like: The value will be read/written
not just from/to a register, but at least to the core's L1 cache, in
the same order that the reads/writes appear in the code.

No, it absolutely does not. And that makes volatile almost useless for the purpose of MT safe code.

If it did, then volatile would be quite good for variables shared by multiple thread as ordering the events in the L1 cache is all you need to do in typical CPU (that is either multi-core or multi-CPU on motherboard) capable of cooperating in a way that makes a normal implementation of either C/C++ or Java multithreading possible with typical expected costs (that is, not a huge cost on most atomic or non-contented mutex operations).

But volatile does not provide any guaranteed ordering (or "memory visibility") in the cache either in theory or in practice.

(Note: the following is based on sound interpretation of the standard documents, the standard's intent, historical practice, and a deep understand of the expectations of compiler writers. This approach based on history, actual practices, and expectations and understanding of real persons in the real world, which is much stronger and more reliable than parsing the words of a document that is not known to be stellar specification writing and which has been revised many times.)

In practice, volatile does guarantees ptrace-ability that is the ability to use debug information for the running program, at any level of optimization, and the fact the debug information makes sense for these volatile objects:

you may use ptrace (a ptrace-like mechanism) to set meaningful break points at the sequence points after operations involving volatile objects: you can really break at exactly these points (note that this works only if you are willing to set many break points as any C/C++ statement may be compiled to many different assembly start and end points, as in a massively unrolled loop);
while a thread of execution of stopped, you may read the value of all volatile objects, as they have their canonical representation (following the ABI for their respective type); a non volatile local variable could have an atypical representation, f.ex. a shifted representation: a variable used for indexing an array might be multiplied by the size of individual objects, for easier indexing; or it might be replaced by a pointer to an array element (as long as all uses of the variable as similarly converted) (think changing dx to du in an integral);
you can also modify those objects (as long as the memory mappings allow that, as volatile object with static lifetime that are const qualified might be in a memory range mapped read only).

Volatile guarantee in practice a little more than the strict ptrace interpretation: it also guarantees that volatile automatic variables have an address on the stack, as they aren't allocated to a register, a register allocation which would make ptrace manipulations more delicate (compiler can output debug information to explain how variables are allocated to registers, but reading and changing register state is slightly more involved than accessing memory addresses).

Note that full program debug-ability, that is considering all variables volatile at least at sequence points, is provided by the "zero optimization" mode of the compiler, a mode which still performs trivial optimizations like arithmetic simplifications (there is usually no guaranteed no optimization at all mode). But volatile is stronger than non optimization: x-x can be simplified for a non volatile integer x but not of a volatile object.

So volatile means guaranteed to be compiled as is, like the translation from source to binary/assembly by the compiler of a system call isn't a reinterpretation, changed, or optimized in any way by a compiler. Note that library calls may or may not be system calls. Many official system functions are actually library function that offer a thin layer of interposition and generally defer to the kernel at the end. (In particular getpid doesn't need to go to the kernel and could well read a memory location provided by the OS containing the information.)

Volatile interactions are interactions with the outside world of the real machine, which must follow the "abstract machine". They aren't internal interactions of program parts with other program parts. The compiler can only reason about what it knows, that is the internal program parts.

The code generation for a volatile access should follow the most natural interaction with that memory location: it should be unsurprising. That means that some volatile accesses are expected to be atomic: if the natural way to read or write the representation of a long on the architecture is atomic, then it's expected that a read or write of a volatile long will be atomic, as the compiler should not generate silly inefficient code to access volatile objects byte by byte, for example.

You should be able to determine that by knowing the architecture. You don't have to know anything about the compiler, as volatile means that the compiler should be transparent.

But volatile does no more than force the emission of expected assembly for the least optimized for particular cases to do a memory operation: volatile semantics means general case semantic.

The general case is what the compiler does when it doesn't have any information about a construct: f.ex. calling a virtual function on an lvalue via dynamic dispatch is a general case, making a direct call to the overrider after determining at compile time the type of the object designated by the expression is a particular case. The compiler always have a general case handling of all constructs, and it follows the ABI.

Volatile does nothing special to synchronize threads or provide "memory visibility": volatile only provides guarantees at the abstract level seen from inside a thread executing or stopped, that is the inside of a CPU core:

volatile says nothing about which memory operations reach main RAM (you may set specific memory caching types with assembly instructions or system calls to obtain these guarantees);
volatile doesn't provide any guarantee about when memory operations will be committed to any level of cache (not even L1).

Only the second point means volatile is not useful in most inter threads communication problems; the first point is essentially irrelevant in any programming problem that doesn't involve communication with hardware components outside the CPU(s) but still on the memory bus.

The property of volatile providing guaranteed behavior from the point of the view of the core running the thread means that asynchronous signals delivered to that thread, which are run from the point of view of the execution ordering of that thread, see operations in source code order.

Unless you plan to send signals to your threads (an extremely useful approach to consolidation of information about currently running threads with no previously agreed point of stopping), volatile is not for you.

Is there any reason I shouldn't use the volatile keyword for all variable and function declarations in C?

The OS will not change your variables unless you ask it to do that (by some function call that involves the OS).

Don't use volatile all over. It will reduce program performance because it forces all variables to be updated from memory all the time. So you won't get benefit from the cache, nor from the normal compiler optimization.

Actually, it's pretty seldom that you will use volatile. Read this answer from another question to learn more: https://stackoverflow.com/a/246148/4386427

Is volatile useful at all in a singlethreaded application in C++?

This is an answer for C and C++

Yes! When the variable is mapped to a hardware register (an I/O device for instance). The hardware modifies the register, independently of the application.

Example:

extern volatile uint32_t MY_DEVICE_START; // write-only register
extern volatile const uint32_t MY_DEVICE_STATUS; // read-only register
extern volatile uint32_t MY_DEVICE_DATA; // read-write register

...
MY_DEVICE_DATA = 42; // send input to the device
MY_DEVICE_START = 1; // start the device
while (MY_DEVICE_STATUS == 0) {} // busy-wait for the device to finish
int result = MY_DEVICE_DATA; // read output from the device
...

At least in C/C++ that's the main reason for it. Volatile is even not recommended for multi-threaded use.

Why Is Volatile Not Considered Useful in Multithreaded C or C++ Programming