"Volatile" Qualifier and Compiler Reorderings

volatile qualifier and compiler reorderings

The C++ standard says (1.9/6):

The observable behavior of the
abstract machine is its sequence of
reads and writes to volatile data and
calls to library I/O functions.

In scenario 1, either of the changes you propose changes the sequence of writes to volatile data.

In scenario 2, neither change you propose changes the sequence. So they're allowed under the "as-if" rule (1.9/1):

... conforming implementations are
required to emulate (only) the
observable behavior of the abstract
machine ...

In order to tell that this has happened, you would need to examine the machine code, use a debugger, or provoke undefined or unspecified behavior whose result you happen to know on your implementation. For example, an implementation might make guarantees about the view that concurrently-executing threads have of the same memory, but that's outside the scope of the C++ standard. So while the standard might permit a particular code transformation, a particular implementation could rule it out, on grounds that it doesn't know whether or not your code is going to run in a multi-threaded program.

If you were to use observable behavior to test whether the re-ordering has happened or not (for example, printing the values of variables in the above code), then of course it would not be allowed by the standard.

May accesses to volatiles be reordered?

IMO Chisnalls interpretation (as presented by you) is clearly wrong. The simpler case is C++98. The sequence of reads and writes to volatile data needs to be preserved and that applies to the ordered sequence of reads and writes of any volatile data, not to a single variable.

This becomes obvious, if you consider the original motivation for volatile: memory-mapped I/O. In mmio you typically have several related registers at different memory location and the protocol of an I/O device requires a specific sequence of reads and writes to its set of registers - order between registers is important.

The C++11 wording avoids talking about an absolute sequence of reads and writes, because in multi-threaded environments there is not one single well-defined sequence of such events across threads - and that is not a problem, if these accesses go to independent memory locations. But I believe the intent is that for any sequence of volatile data accesses with a well-defined order the rules remain the same as for C++98 - the order must be preserved, no matter how many different locations are accessed in that sequence.

It is an entirely separate issue what that entails for an implementation. How (and even if) a volatile data access is observable from outside the program and how the access order of the program maps to externally observable events is unspecified. An implementation should probably give you a reasonable interpretation and reasonable guarantees, but what is reasonable depends on the context.

The C++11 standard leaves room for data races between unsynchronized volatile accesses, so there is nothing that requires surrounding these by full memory fences or similar constructs. If there are parts of memory that are truly used as external interface - for memory-mapped I/O or DMA - then it may be reasonable for the implementation to give you guarantees for how volatile accesses to these parts are exposed to consuming devices.

One guarantee can probably be inferred from the standard (see [into.execution]): values of type volatile std::sigatomic_t must have values compatible with the order of writes to them even in a signal handler - at least in a single-threaded program.

How compiler enforces C++ volatile in ARM assembly

so, in the below example, second store can be executed before first store since both destinations are disjoint, and hence they can be freely reordered.

The volatile keyword limits the reordering (and elision) of instructions by the compiler, but its semantics don't say anything about visibility from other threads or processors.

When you see

        str     r1, [r3]
        str     r2, [r3, #4]

then volatile has done everything required. If the addresses of x and y are I/O mapped to a hardware device, it will have received the x store first. If an interrupt pauses operation of this thread between the two instructions, the interrupt handler will see the x store and not the y. That's all that is guaranteed.

The memory ordering model only describes the order in which effects are observable from other processors. It doesn't alter the sequence in which instructions are issued (which is the order they appear in the assembly code), but the order in which they are committed (ie, a store becomes externally visible).

It is certainly possible that a different processor could see the result of the y store before the x - but volatile is not and never has been relevant to that problem. The cross-platform solution to this is std::atomic.

There is unfortunately a load of obsolete C code available on the internet that does use volatile for synchronization - but this is always a platform-specific extension, and was never a great idea anyway. Even less fortunately the keyword was given exactly those semantics in Java (which isn't really used for writing interrupt handlers), increasing the confusion.

If you do see something using volatile like this, it's either obsolete or was incompetently translated from Java. Use std::atomic, and for anything more complex than simple atomic load/store, it's probably better (and is certainly easier) to use std::mutex.

Why is discarding the volatile qualifier in a function call a warning?

The warning here is because the compiler assumes that when you have a pointer to a volatile pointer object, that you honestly believe that the pointee value might change from an outside source. When you pass this pointer into a function that asks for a pointer to a non-volatile object, the compiler warns you that the function call might be optimized in a way that doesn't correctly account for the fact that the object might change.

The fact that you know for certain that it's okay to do this means that you might want to put in an explicit cast that removes volatile, such as this one:

awesome_function((uint8_t*) &thingy);

This explicitly tells the compiler "I know that I'm removing volatile here, so don't warn me about it." After all, the whole point of the warning is that you might not have noticed this.

A good analogue would be to think about const. If you have a pointer to a const object, you are promising not to modify that object through the pointer. If you tried passing this pointer into a function that took a pointer to a non-const object, you would get a warning because the compiler notices that you might accidentally end up changing the value through the function. Putting in an explicit cast would be a way to tell the compiler "yes, I know this pointer shouldn't be used to modify things, but I promise I know what I'm doing."

Hope this helps!

Is it necessary to use the volatile qualifier even in case the GCC optimisations are turned off?

With optimizations disabled it seems unlikely that you'd need volatile. However, the compiler can do trivial optimizations even at O0. For example it might remove parts of the code that it can deduct won't be used. So not using volatile will be a gamble. I see no reason why you shouldn't be using volatile, particularly not if you run with no optimizations on anyway.

Also, regardless of optimization level, variables may be pre-fetch cached on high end MCUs with data cache. Whether volatile solves/should solve this is debatable, however.

Does the C++ volatile keyword introduce a memory fence?

Rather than explaining what volatile does, allow me to explain when you should use volatile.

When inside a signal handler. Because writing to a volatile variable is pretty much the only thing the standard allows you to do from within a signal handler. Since C++11 you can use std::atomic for that purpose, but only if the atomic is lock-free.
When dealing with setjmp according to Intel.
When dealing directly with hardware and you want to ensure that the compiler does not optimize your reads or writes away.

For example:

volatile int *foo = some_memory_mapped_device;
while (*foo)
    ; // wait until *foo turns false

Without the volatile specifier, the compiler is allowed to completely optimize the loop away. The volatile specifier tells the compiler that it may not assume that 2 subsequent reads return the same value.

Note that volatile has nothing to do with threads. The above example does not work if there was a different thread writing to *foo because there is no acquire operation involved.

In all other cases, usage of volatile should be considered non-portable and not pass code review anymore except when dealing with pre-C++11 compilers and compiler extensions (such as msvc's /volatile:ms switch, which is enabled by default under X86/I64).

arm compiler 5 do not fully respect volatile qualifier

The main rule governing volatile objects is this, from C11 6.7.3/7:

any expression referring to such an object shall be evaluated strictly
according to the rules of the abstract machine, as described in
5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine,
except as modified by the unknown factors mentioned previously.

And it goes on to say that

What constitutes an access to an object that has volatile-qualified
type is implementation-defined.

, which applies to how other rules (e.g. in 5.1.2.3) are to be interpreted. Your compiler's Users' Guide discusses the details of volatile accesses, but there doesn't seem to be anything surprising there. Section 5.1.2.3 itself mainly talks about sequencing rules; the rules for evaluating expressions are elsewhere (but must still be followed as given with regard to accesses to your volatile object).

Here are the relevant details of the behavior of the abstract machine:

the assignment operation has a side effect of storing a value in the object identified by status. There is a sequence point at the end of that statement, so
- the side effect is applied before any evaluations appearing in subsequent statements are performed, and
- because status is volatile, the assignment expressed by that line is the last write to status performed by the program before the sequence point.
the conditional expression in the if statement is evaluated next, with
- the sub-expression (status) == (SUCCESS_CONST) being evaluated first, before any of the other sub-expressions.
- Evaluation of status happens before evaluation of the == operation, and
- takes the form of converting that identifier to the value stored in the object it identifies (lvalue conversion, per paragraph 6.3.2.1/2).
- In order to do anything with the value stored in status at that time, that value must first be read.

The standard does not require a volatile object to reside in addressable storage, so in principle, your volatile automatic variable could be assigned exclusively to a register. In that event, as long as machine instructions using that object either read its value directly from its register or make updates directly to its register, no separate loads or stores would be required to achieve proper volatile semantics. Your particular object does not appear to fall into this category, however, because the store instruction in your generated assembly seems to indicate that it is, indeed, associated with a location in memory.

Moreover, if the program correctly implemented volatile semantics for an object assigned to a register, then that register would have to be r0. I'm not familiar with the specifics of this assembly language and the processor on which the code runs, but it certainly does not look like r0 is a viable locus for such storage.

With that being the case I agree that status should have been read back from memory, and it should be read back from memory again if its second appearance in the conditional expression needs to be evaluated. This is the behavior of the abstract machine, which conforming implementations exhibit with respect to all volatile accesses. My analysis, then, is that your implementation is non-conforming in this regard, and I would be inclined to report that as a bug.

As for a workaround, I think your best bet as to write the important bits in assembly -- inline assembly if your implementation supports that, or as a complete function implemented in assembly if necessary.

"Volatile" Qualifier and Compiler Reorderings