Why Is Volatile Deprecated in C++20

Why is volatile deprecated in C++20?

There's a good talk by the C++ committee language evolution chair on why.

Brief summary, the places that volatile is being removed from didn't have any well defined meaning in the standard and just caused confusion.

Motivating (Ambiguous) Examples

Volatile bit Fields should be specified by your hardware manual and/or compiler.
Is += a single/atomic instruction? How about ++?
How many reads/writes are needed for compare_exchange? What if it fails?
What does void foo(int volatile n) mean? or int volatile foo()?
Should *vp; do a load? (This has changed twice in the standard.)

Threading

Historically, people have used volatile to achieve thread safety in C and C++. In C++11, non-UB ways to create synchronization and shared state between threads were added. I recommend Back to Basics: Concurrency as a good introduction.

Is the volatile qualifier deprecated in c++20?

You can't use |= anymore, but you can use =, so change this:

(*(volatile uint32_t *)regAddr) |= (1UL << CLK_GATE_ABSTRACT_BITS_SHIFT((uint32_t)name));

To this:

*(volatile uint32_t *)regAddr = *(volatile uint32_t *)regAddr | (1UL << CLK_GATE_ABSTRACT_BITS_SHIFT((uint32_t)name));

Volatile in C++11

Whether it is optimized out depends entirely on compilers and what they choose to optimize away. The C++98/03 memory model does not recognize the possibility that x could change between the setting of it and the retrieval of the value.

The C++11 memory model does recognize that x could be changed. However, it doesn't care. Non-atomic access to variables (ie: not using std::atomics or proper mutexes) yields undefined behavior. So it's perfectly fine for a C++11 compiler to assume that x never changes between the write and reads, since undefined behavior can mean, "the function never sees x change ever."

Now, let's look at what C++11 says about volatile int x;. If you put that in there, and you have some other thread mess with x, you still have undefined behavior. Volatile does not affect threading behavior. C++11's memory model does not define reads or writes from/to x to be atomic, nor does it require the memory barriers needed for non-atomic reads/writes to be properly ordered. volatile has nothing to do with it one way or the other.

Oh, your code might work. But C++11 doesn't guarantee it.

What volatile tells the compiler is that it can't optimize memory reads from that variable. However, CPU cores have different caches, and most memory writes do not immediately go out to main memory. They get stored in that core's local cache, and may be written... eventually.

CPUs have ways to force cache lines out into memory and to synchronize memory access among different cores. These memory barriers allow two threads to communicate effectively. Merely reading from memory in one core that was written in another core isn't enough; the core that wrote the memory needs to issue a barrier, and the core that's reading it needs to have had that barrier complete before reading it to actually get the data.

volatile guarantees none of this. Volatile works with "hardware, mapped memory and stuff" because the hardware that writes that memory makes sure that the cache issue is taken care of. If CPU cores issued a memory barrier after every write, you can basically kiss any hope of performance goodbye. So C++11 has specific language saying when constructs are required to issue a barrier.

volatile is about memory access (when to read); threading is about memory integrity (what is actually stored there).

The C++11 memory model is specific about what operations will cause writes in one thread to become visible in another. It's about memory integrity, which is not something volatile handles. And memory integrity generally requires both threads to do something.

For example, if thread A locks a mutex, does a write, and then unlocks it, the C++11 memory model only requires that write to become visible to thread B if thread B later locks it. Until it actually acquires that particular lock, it's undefined what value is there. This stuff is laid out in great detail in section 1.10 of the standard.

Let's look at the code you cite, with respect to the standard. Section 1.10, p8 speaks of the ability of certain library calls to cause a thread to "synchronize with" another thread. Most of the other paragraphs explain how synchronization (and other things) build an order of operations between threads. Of course, your code doesn't invoke any of this. There is no synchronization point, no dependency ordering, nothing.

Without such protection, without some form of synchronization or ordering, 1.10 p21 comes in:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

Your program contains two conflicting actions (reading from x and writing to x). Neither is atomic, and neither is ordered by synchronization to happen before the other.

Thus, you have achieved undefined behavior.

So the only case where you get guaranteed multithreaded behavior by the C++11 memory model is if you use a proper mutex or std::atomic<int> x with the proper atomic load/store calls.

Oh, and you don't need to make x volatile too. Anytime you call a (non-inline) function, that function or something it calls could modify a global variable. So it cannot optimize away the read of x in the while loop. And every C++11 mechanism to synchronize requires calling a function. That just so happens to invoke a memory barrier.

Why is volatile int convertible to int but volatile T is not convertible to T?

The implicitly-declared default constructor for type has this signature

type::type(type const&);

The reference cannot bind to a volatile object, since that would discard qualifiers.

[class.copy.ctor]
7 The implicitly-declared copy constructor for a class X will
have the form
X::X(const X&)
if each potentially constructed subobject of a class type M (or array
thereof) has a copy constructor whose first parameter is of type
const M& or const volatile M&. Otherwise, the implicitly-declared
copy constructor will have the form
X::X(X&)

Either way, the compiler isn't going to implicitly-declare a constructor that takes a reference to a volatile object.

The fact that the target object in the conversion is volatile too makes no difference. Such a conversion requires a copy constructor capable of binding to a volatile source.

Fundamental types aren't copied by constructors, so this behavior doesn't restrict them.

Is there a way to make a volatile class copyable?

If it's required, you'd need to user-declared copy constructor that accepts by a reference to a const volatile object.

Understanding volatile keyword in c++

The volatile keyword in C++ was inherited it from C, where it was intended as a general catch-all to indicate places where a compiler should allow for the possibility that reading or writing an object might have side-effects it doesn't know about. Because the kinds of side-effects that could be induced would vary among different platforms, the Standard leaves the question of what allowances to make up to compiler writers' judgments as to how they should best serve their customers.

Microsoft's compilers for the 8088/8086 and later x86 have for decades been designed to support the practice of using volatile objects to build a mutex which guards "ordinary" objects. As a simple example: if thread 1 does something like:

ordinaryObject = 23;
volatileFlag = 1;
while(volatileFlag)
  doOtherStuffWhileWaiting();
useValue(ordinaryObject);

and thread 2 periodically does something like:

if (volatileFlag)
{
  ordinaryObject++;
  volatileFlag=0;
}

then the accesses to volatileFlag would serve as a warning to Microsoft's compilers that they should refrain from making assumptions about how any preceding actions on any objects would interact with later actions. This pattern has been followed with the volatile qualifiers in other languages like C#.

Unfortunately, neither clang nor gcc includes any option to treat volatile in such a fashion, opting instead to require that programmers use compiler-specific intrinsics to yield the same semantics that Microsoft could achieve using only the Standard keyword volatile that was intended to be suitable for such purposes [according to the authors of the Standard, "A volatile object is also an appropriate model for a variable shared among multiple processes."--see http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf p. 76 ll. 25-26]

set a flag in int variable from different context on bare metal controller

The rationale is that compound assignments or pre-post incrementations or decrementations are not atomic even on a volatile variable, while a programmer could see it as a single operation. Moreover, the standard says that E1 op= E2 is the same as E1 = E1 op E2 except that E1 is evaluated only once.

That means that non cautious programmers could use

volatile uint8_t some_flags;
...
some_flags|= new_set_flags;

with the expectation that it will be atomic, even in presence of hardware interruptions while it is not required to be.

At the machine level it looks like 3 operations:

load value from memory
update accumulator register
store value to memory

That means that without more precautions, a race condition occurs if 2 executions threads (here normal processing and an ISR) are interleaved:

normal loads
! ISR takes the processor
ISR loads updates and stores
! return from ISR
normal updates and stores erasing the change from ISR

When the program uses a temp variable it is evident that race conditions could occur.

What is bad for you, it that the C++ commitee has deprecated that use with the intention to later fully remove it.

So you can:

add to the specification of your code that it depends on allowing compound assignment on volatile variables and just hope that compilers will offer options for it (feels reasonable even if not very nice)
add to the specifications of your code that is is C++17 compatible but will not support C++20 and over
change that to be compiled as C code (C++ standard still support cross C - C++ linking)
just write it as some_flags = some_flags | new_set_flags;

I prefere the last way because for a volatile byte, there are no reason that the compiler produces a less efficient code, and it is conformant from early C version to the last C++ one

References:

P1152R4 Deprecating volatile Rev4: current version
P1152R4 Deprecating volatile Rev0: original version with extensive background information mainly the Why the proposed changes? part

Does volatile guarantee anything at all in portable C code for multi-core systems?

To summarize the problem, it appears (after reading a lot) that
"volatile" guarantees something like: The value will be read/written
not just from/to a register, but at least to the core's L1 cache, in
the same order that the reads/writes appear in the code.

No, it absolutely does not. And that makes volatile almost useless for the purpose of MT safe code.

If it did, then volatile would be quite good for variables shared by multiple thread as ordering the events in the L1 cache is all you need to do in typical CPU (that is either multi-core or multi-CPU on motherboard) capable of cooperating in a way that makes a normal implementation of either C/C++ or Java multithreading possible with typical expected costs (that is, not a huge cost on most atomic or non-contented mutex operations).

But volatile does not provide any guaranteed ordering (or "memory visibility") in the cache either in theory or in practice.

(Note: the following is based on sound interpretation of the standard documents, the standard's intent, historical practice, and a deep understand of the expectations of compiler writers. This approach based on history, actual practices, and expectations and understanding of real persons in the real world, which is much stronger and more reliable than parsing the words of a document that is not known to be stellar specification writing and which has been revised many times.)

In practice, volatile does guarantees ptrace-ability that is the ability to use debug information for the running program, at any level of optimization, and the fact the debug information makes sense for these volatile objects:

you may use ptrace (a ptrace-like mechanism) to set meaningful break points at the sequence points after operations involving volatile objects: you can really break at exactly these points (note that this works only if you are willing to set many break points as any C/C++ statement may be compiled to many different assembly start and end points, as in a massively unrolled loop);
while a thread of execution of stopped, you may read the value of all volatile objects, as they have their canonical representation (following the ABI for their respective type); a non volatile local variable could have an atypical representation, f.ex. a shifted representation: a variable used for indexing an array might be multiplied by the size of individual objects, for easier indexing; or it might be replaced by a pointer to an array element (as long as all uses of the variable as similarly converted) (think changing dx to du in an integral);
you can also modify those objects (as long as the memory mappings allow that, as volatile object with static lifetime that are const qualified might be in a memory range mapped read only).

Volatile guarantee in practice a little more than the strict ptrace interpretation: it also guarantees that volatile automatic variables have an address on the stack, as they aren't allocated to a register, a register allocation which would make ptrace manipulations more delicate (compiler can output debug information to explain how variables are allocated to registers, but reading and changing register state is slightly more involved than accessing memory addresses).

Note that full program debug-ability, that is considering all variables volatile at least at sequence points, is provided by the "zero optimization" mode of the compiler, a mode which still performs trivial optimizations like arithmetic simplifications (there is usually no guaranteed no optimization at all mode). But volatile is stronger than non optimization: x-x can be simplified for a non volatile integer x but not of a volatile object.

So volatile means guaranteed to be compiled as is, like the translation from source to binary/assembly by the compiler of a system call isn't a reinterpretation, changed, or optimized in any way by a compiler. Note that library calls may or may not be system calls. Many official system functions are actually library function that offer a thin layer of interposition and generally defer to the kernel at the end. (In particular getpid doesn't need to go to the kernel and could well read a memory location provided by the OS containing the information.)

Volatile interactions are interactions with the outside world of the real machine, which must follow the "abstract machine". They aren't internal interactions of program parts with other program parts. The compiler can only reason about what it knows, that is the internal program parts.

The code generation for a volatile access should follow the most natural interaction with that memory location: it should be unsurprising. That means that some volatile accesses are expected to be atomic: if the natural way to read or write the representation of a long on the architecture is atomic, then it's expected that a read or write of a volatile long will be atomic, as the compiler should not generate silly inefficient code to access volatile objects byte by byte, for example.

You should be able to determine that by knowing the architecture. You don't have to know anything about the compiler, as volatile means that the compiler should be transparent.

But volatile does no more than force the emission of expected assembly for the least optimized for particular cases to do a memory operation: volatile semantics means general case semantic.

The general case is what the compiler does when it doesn't have any information about a construct: f.ex. calling a virtual function on an lvalue via dynamic dispatch is a general case, making a direct call to the overrider after determining at compile time the type of the object designated by the expression is a particular case. The compiler always have a general case handling of all constructs, and it follows the ABI.

Volatile does nothing special to synchronize threads or provide "memory visibility": volatile only provides guarantees at the abstract level seen from inside a thread executing or stopped, that is the inside of a CPU core:

volatile says nothing about which memory operations reach main RAM (you may set specific memory caching types with assembly instructions or system calls to obtain these guarantees);
volatile doesn't provide any guarantee about when memory operations will be committed to any level of cache (not even L1).

Only the second point means volatile is not useful in most inter threads communication problems; the first point is essentially irrelevant in any programming problem that doesn't involve communication with hardware components outside the CPU(s) but still on the memory bus.

The property of volatile providing guaranteed behavior from the point of the view of the core running the thread means that asynchronous signals delivered to that thread, which are run from the point of view of the execution ordering of that thread, see operations in source code order.

Unless you plan to send signals to your threads (an extremely useful approach to consolidation of information about currently running threads with no previously agreed point of stopping), volatile is not for you.

Struct with a volatile member no longer a POD according to MSVC

From the Microsoft documentation:

When a class or struct is both trivial and standard-layout, it is a POD (Plain Old Data) type.

It later describes literal types, including the following condition:

Additionally, all its non-static data members and base classes must be literal types and not volatile.

There is no mention of "volatile" anywhere else on the page.

This all matches what we find in the standard.

Therefore, I conclude it's a compiler bug.

getStuff() generates an error because it attempts to return a type that is not compatible with the C calling convention.

Actually, this is just a warning (C4190), which you could disable, if you wanted. Visual Studio on x86_64 only has one calling convention (described here). Your code will still work fine. VS is just warning you that the type won't work if you actually try to use it in C. extern "C" does not mean compiling as C.

However, it is true that getting this warning suggests the bug is indeed in the compiler, rather than simply in the implementation of std::is_pod.

Also, I would recommend avoiding POD terminology and the std::is_pod trait in new code, since they are deprecated from C++20.

The extern "C" part is in a C header that needs to stay as is. How could I use it from C++?

Anything that doesn't actually require the type to fit VS's definition of "POD", type trait notwithstanding, should be fine.

Why Is Volatile Deprecated in C++20