What Rules Does Compiler Have to Follow When Dealing with Volatile Memory Locations

What Rules does compiler have to follow when dealing with volatile memory locations?

A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).

Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).

Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.

Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.

Does the C++ volatile keyword introduce a memory fence?

Rather than explaining what volatile does, allow me to explain when you should use volatile.

When inside a signal handler. Because writing to a volatile variable is pretty much the only thing the standard allows you to do from within a signal handler. Since C++11 you can use std::atomic for that purpose, but only if the atomic is lock-free.
When dealing with setjmp according to Intel.
When dealing directly with hardware and you want to ensure that the compiler does not optimize your reads or writes away.

For example:

volatile int *foo = some_memory_mapped_device;
while (*foo)
    ; // wait until *foo turns false

Without the volatile specifier, the compiler is allowed to completely optimize the loop away. The volatile specifier tells the compiler that it may not assume that 2 subsequent reads return the same value.

Note that volatile has nothing to do with threads. The above example does not work if there was a different thread writing to *foo because there is no acquire operation involved.

In all other cases, usage of volatile should be considered non-portable and not pass code review anymore except when dealing with pre-C++11 compilers and compiler extensions (such as msvc's /volatile:ms switch, which is enabled by default under X86/I64).

Is the volatile the correct way to inform the compiler about concurrent access to a variable

Congratulations on figuring so many details out for yourself. Yes, volatile is not particularly useful for multithreaded programming, and constructs provided by your platform-specific multithreading library (e.g. pthreads) should always be preferred.

Specifically, you should use a read-write lock: an object which can be unlocked for one writer at a time to the exclusion of readers and other writers, or unlocked by multiple readers to the exclusion of any writer. This will be included in any threading API.

C++0x atomic<T> does solve the problem, you should never need volatile unless you are writing a device driver. However atomic is at a lower level and you'll probably be better off with the read-write lock abstraction.

What kinds of optimizations does 'volatile' prevent in C++?

Basically, volatile announces that a value might change behind your program's back. That prevents compilers from caching the value (in a CPU register) and from optimizing away accesses to that value when they seem unnecessary from the POV of your program.

What should trigger usage of volatile is when a value changes despite the fact that your program hasn't written to it, and when no other memory barriers (like mutexes as used for multi-threaded programs) are present.

Does ARM procedure call standard allows volatile optimisation that contradict the C standard?

So I reached ARM toolchain's support group, and according to them the ARM PCS standard is an independent standard that is not bound to the C standard, such that a compiler can choose to comply to one, or both of them. In their own words:

In a way it's not really a contradiction
the APCS permits a compiler to respect or ignore local volatile
the C standard requires a compiler to respect local volatile
so a compiler that is compatible with both will respect local volatile.
Armclang has elected to follow the C standard which makes it compatible with both

So if a compiler choose to perform this non C-conforming optimization, it is still ARM PCS conforming implementation, but not a C-conforming compiler.

To conclude, a C-conforming compiler for ARM architecture which implements ARM PCS will never perform this optimization.

Is `volatile` enough to allow the compiler to handle machine registers with side-effects on read?

Is there enough information in volatile to get the compiler to understand that there might be side effects from a read?

Yes.

The C language formal definition of a side effect actually targets this very scenario. C11 5.1.2.3:

Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment.

Regarding what the compile is allowed to optimize, C11 5.2.3.4:

In the abstract machine, all expressions are evaluated as specified by the semantics. An
actual implementation need not evaluate part of an expression if it can deduce that its
value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

In plain English, this means that any form of access, read or write, to a volatile object, is considered a side effect and a compiler is not allowed to optimize away side effects.

...then my intuition is that these three functions have different behavior

Indeed they have. This is why coding standards such as MISRA-C forbids us to mix volatile variable access together with other things in the same expression. In the UART scenario, doing so might cause loss of status flags which would be a severe bug.

Robust programs read/write to volatile variables on a single line and do all other necessary arithmetic in separate expressions.

What Rules Does Compiler Have to Follow When Dealing with Volatile Memory Locations