Is 'Volatile' Needed in This Multi-Threaded C++ Code

Is 'volatile' needed in this multi-threaded C++ code?

You should not depend on volatile to guarantee thread safety, this is because even though the compiler will guarantee that the the variable is always read from memory (and not a register cache), in multi-processor environments a memory barrier will also be required.

Rather use the correct lock around the shared memory. Locks like a Critical Section are often extremely lightweight and in a case of no contention will probably be all implemented userside. They will also contain the necessary memory barriers.

Volatile should only be used for memory mapped IO where multiple reads may return different values. Similarly for memory mapped writes.

Why is volatile not considered useful in multithreaded C or C++ programming?

The problem with volatile in a multithreaded context is that it doesn't provide all the guarantees we need. It does have a few properties we need, but not all of them, so we can't rely on volatile alone.

However, the primitives we'd have to use for the remaining properties also provide the ones that volatile does, so it is effectively unnecessary.

For thread-safe accesses to shared data, we need a guarantee that:

the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)
that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

volatile does guarantee the first point. It also guarantees that no reordering occurs between different volatile reads/writes. All volatile memory accesses will occur in the order in which they're specified. That is all we need for what volatile is intended for: manipulating I/O registers or memory-mapped hardware, but it doesn't help us in multithreaded code where the volatile object is often only used to synchronize access to non-volatile data. Those accesses can still be reordered relative to the volatile ones.

The solution to preventing reordering is to use a memory barrier, which indicates both to the compiler and the CPU that no memory access may be reordered across this point. Placing such barriers around our volatile variable access ensures that even non-volatile accesses won't be reordered across the volatile one, allowing us to write thread-safe code.

However, memory barriers also ensure that all pending reads/writes are executed when the barrier is reached, so it effectively gives us everything we need by itself, making volatile unnecessary. We can just remove the volatile qualifier entirely.

Since C++11, atomic variables (std::atomic<T>) give us all of the relevant guarantees.

Is a global implicitly volatile in C?

Is a global implicitly volatile in C?

No.

You are mixed up about the extern variables and volatile variables because they both can't be fully optimized by the compiler because they both can be modified externally.
But that doesn't mean they have the same semantics.

Consider Link Time Optimzation which is possible for global variables. That means, it's entirely possible for a compiler to have complete information on a global variable and thus optimize it accordingly just like any other variable.
But volatile means compiler just can't have any assumptions about it. Because it could be modified externally at run time.

Thus:

volatile int vol_global_var;

and

int nonvol_global_var;

can't be treated as same. Because, with sufficient information, a compiler could completely optimize away nonvol_global_var. But it can never do that for vol_global_var.

So, when would you make a global variable volatile qualified is not different deciding when you'd want make any variable to be volatile qualified.

Is volatile necessary for the resource used in a critical section?

I am curious about whether volatile is necessary for the resources used in a critical section. Consider I have two threads executed on two CPUs and they are competing on a shared resource. I know I need to a locking mechanism to make sure only one thread is performing operations on that shared resource.

Making sure that only one thread accesses a shared resource at a time is only part of what a locking mechanism adequate for the purpose will do. Among other things, such a mechanism will also ensure that all writes to shared objects performed by thread T_i before it releases lock L are visible to all other threads T_j after they subsequently acquire lock L. And that in terms of the C semantics of the program, notwithstanding any questions of compiler optimization, register usage, CPU instruction reordering, or similar.

When such a locking mechanism is used, volatile does not provide any additional benefit for making threads' writes to shared objects be visible to each other. When such a locking mechanism is not used, volatile does not provide a complete substitute.

C's built-in (since C11) mutexes provide a suitable locking mechanism, at least when using C's built-in threads. So do pthreads mutexes, Sys V and POSIX semaphores, and various other, similar synchronization objects available in various environments, each with respect to corresponding multithreading systems. These semantics are pretty consistent across C-like multithreading implementations, extending at least as far as Java. The semantic requirements for C's built-in multithreading are described in section 5.1.2.4 of the current (C17) language spec.

volatile is for indicating that an object might be accessed outside the scope of the C semantics of the program. That may happen to produce properties that interact with multithreaded execution in a way that is taken to be desirable, but that is not the purpose or intended use of volatile. If it were, or if volatile were sufficient for such purposes, then we would not also need _Atomic objects and operations.

The previous remarks focus on language-level semantics, and that is sufficient to answer the question. However, inasmuch as the question asks specifically about accessing variables' values from registers, I observe that compilers don't actually have to do anything much multithreading-specific in that area as long as acquiring and releasing locks requires calling functions.

In particular, if an execution E of function f writes to an object o that is visible to other functions or other executions of f, then the C implementation must ensure that that write is actually performed on memory before E evaluates any subsequent function call (such as is needed to release a lock). This is necessary because because the value written must be visible to the execution of the called function, regardless of any other threads.

Similarly, if E uses the value of o after return from a function call (such as is needed to acquire a lock) then it must load that value from memory to ensure that it sees the effect of any write that the function may have performed.

The only thing special to multithreading in this regard is that the implementation must ensure that interprocedural analysis optimizations or similar do not subvert the needed memory reads and writes around the lock and unlock functions. In practice, this rarely requires special attention.

Volatile C/C++ on a multithread app

volatile in C++ is not meant for concurrency. It's about whether the compiler is allowed to optimize away reads from a variable or not. It is primarily used for things such as interfacing with hardware via memory mapping.

Unfortunately, this means that even if you do have volatile variables, the reads and writes may still access a thread-local store which is not synchronized. Also, an std::vector is not thread safe.

So, in either case, you need to synchronize, for example using a std::mutex (which you do mention). Now, if this is done, the variables which are protected by that mutexdo not need to be volatile. The mutex itself does the synchronization and protects against the type of issues you worry about.

Does volatile guarantee anything at all in portable C code for multi-core systems?

To summarize the problem, it appears (after reading a lot) that
"volatile" guarantees something like: The value will be read/written
not just from/to a register, but at least to the core's L1 cache, in
the same order that the reads/writes appear in the code.

No, it absolutely does not. And that makes volatile almost useless for the purpose of MT safe code.

If it did, then volatile would be quite good for variables shared by multiple thread as ordering the events in the L1 cache is all you need to do in typical CPU (that is either multi-core or multi-CPU on motherboard) capable of cooperating in a way that makes a normal implementation of either C/C++ or Java multithreading possible with typical expected costs (that is, not a huge cost on most atomic or non-contented mutex operations).

But volatile does not provide any guaranteed ordering (or "memory visibility") in the cache either in theory or in practice.

(Note: the following is based on sound interpretation of the standard documents, the standard's intent, historical practice, and a deep understand of the expectations of compiler writers. This approach based on history, actual practices, and expectations and understanding of real persons in the real world, which is much stronger and more reliable than parsing the words of a document that is not known to be stellar specification writing and which has been revised many times.)

In practice, volatile does guarantees ptrace-ability that is the ability to use debug information for the running program, at any level of optimization, and the fact the debug information makes sense for these volatile objects:

you may use ptrace (a ptrace-like mechanism) to set meaningful break points at the sequence points after operations involving volatile objects: you can really break at exactly these points (note that this works only if you are willing to set many break points as any C/C++ statement may be compiled to many different assembly start and end points, as in a massively unrolled loop);
while a thread of execution of stopped, you may read the value of all volatile objects, as they have their canonical representation (following the ABI for their respective type); a non volatile local variable could have an atypical representation, f.ex. a shifted representation: a variable used for indexing an array might be multiplied by the size of individual objects, for easier indexing; or it might be replaced by a pointer to an array element (as long as all uses of the variable as similarly converted) (think changing dx to du in an integral);
you can also modify those objects (as long as the memory mappings allow that, as volatile object with static lifetime that are const qualified might be in a memory range mapped read only).

Volatile guarantee in practice a little more than the strict ptrace interpretation: it also guarantees that volatile automatic variables have an address on the stack, as they aren't allocated to a register, a register allocation which would make ptrace manipulations more delicate (compiler can output debug information to explain how variables are allocated to registers, but reading and changing register state is slightly more involved than accessing memory addresses).

Note that full program debug-ability, that is considering all variables volatile at least at sequence points, is provided by the "zero optimization" mode of the compiler, a mode which still performs trivial optimizations like arithmetic simplifications (there is usually no guaranteed no optimization at all mode). But volatile is stronger than non optimization: x-x can be simplified for a non volatile integer x but not of a volatile object.

So volatile means guaranteed to be compiled as is, like the translation from source to binary/assembly by the compiler of a system call isn't a reinterpretation, changed, or optimized in any way by a compiler. Note that library calls may or may not be system calls. Many official system functions are actually library function that offer a thin layer of interposition and generally defer to the kernel at the end. (In particular getpid doesn't need to go to the kernel and could well read a memory location provided by the OS containing the information.)

Volatile interactions are interactions with the outside world of the real machine, which must follow the "abstract machine". They aren't internal interactions of program parts with other program parts. The compiler can only reason about what it knows, that is the internal program parts.

The code generation for a volatile access should follow the most natural interaction with that memory location: it should be unsurprising. That means that some volatile accesses are expected to be atomic: if the natural way to read or write the representation of a long on the architecture is atomic, then it's expected that a read or write of a volatile long will be atomic, as the compiler should not generate silly inefficient code to access volatile objects byte by byte, for example.

You should be able to determine that by knowing the architecture. You don't have to know anything about the compiler, as volatile means that the compiler should be transparent.

But volatile does no more than force the emission of expected assembly for the least optimized for particular cases to do a memory operation: volatile semantics means general case semantic.

The general case is what the compiler does when it doesn't have any information about a construct: f.ex. calling a virtual function on an lvalue via dynamic dispatch is a general case, making a direct call to the overrider after determining at compile time the type of the object designated by the expression is a particular case. The compiler always have a general case handling of all constructs, and it follows the ABI.

Volatile does nothing special to synchronize threads or provide "memory visibility": volatile only provides guarantees at the abstract level seen from inside a thread executing or stopped, that is the inside of a CPU core:

volatile says nothing about which memory operations reach main RAM (you may set specific memory caching types with assembly instructions or system calls to obtain these guarantees);
volatile doesn't provide any guarantee about when memory operations will be committed to any level of cache (not even L1).

Only the second point means volatile is not useful in most inter threads communication problems; the first point is essentially irrelevant in any programming problem that doesn't involve communication with hardware components outside the CPU(s) but still on the memory bus.

The property of volatile providing guaranteed behavior from the point of the view of the core running the thread means that asynchronous signals delivered to that thread, which are run from the point of view of the execution ordering of that thread, see operations in source code order.

Unless you plan to send signals to your threads (an extremely useful approach to consolidation of information about currently running threads with no previously agreed point of stopping), volatile is not for you.

Is 'Volatile' Needed in This Multi-Threaded C++ Code