What Rules does compiler have to follow when dealing with volatile memory locations?
A particular and very common optimization that is ruled out by volatile
is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).
Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).
Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.
Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile
prevents this situation.
Does the C++ volatile keyword introduce a memory fence?
Rather than explaining what volatile
does, allow me to explain when you should use volatile
.
- When inside a signal handler. Because writing to a
volatile
variable is pretty much the only thing the standard allows you to do from within a signal handler. Since C++11 you can usestd::atomic
for that purpose, but only if the atomic is lock-free. - When dealing with
setjmp
according to Intel. - When dealing directly with hardware and you want to ensure that the compiler does not optimize your reads or writes away.
For example:
volatile int *foo = some_memory_mapped_device;
while (*foo)
; // wait until *foo turns false
Without the volatile
specifier, the compiler is allowed to completely optimize the loop away. The volatile
specifier tells the compiler that it may not assume that 2 subsequent reads return the same value.
Note that volatile
has nothing to do with threads. The above example does not work if there was a different thread writing to *foo
because there is no acquire operation involved.
In all other cases, usage of volatile
should be considered non-portable and not pass code review anymore except when dealing with pre-C++11 compilers and compiler extensions (such as msvc's /volatile:ms
switch, which is enabled by default under X86/I64).
Is the volatile the correct way to inform the compiler about concurrent access to a variable
Congratulations on figuring so many details out for yourself. Yes, volatile
is not particularly useful for multithreaded programming, and constructs provided by your platform-specific multithreading library (e.g. pthreads) should always be preferred.
Specifically, you should use a read-write lock: an object which can be unlocked for one writer at a time to the exclusion of readers and other writers, or unlocked by multiple readers to the exclusion of any writer. This will be included in any threading API.
C++0x atomic<T>
does solve the problem, you should never need volatile
unless you are writing a device driver. However atomic
is at a lower level and you'll probably be better off with the read-write lock abstraction.
What kinds of optimizations does 'volatile' prevent in C++?
Basically, volatile
announces that a value might change behind your program's back. That prevents compilers from caching the value (in a CPU register) and from optimizing away accesses to that value when they seem unnecessary from the POV of your program.
What should trigger usage of volatile
is when a value changes despite the fact that your program hasn't written to it, and when no other memory barriers (like mutexes as used for multi-threaded programs) are present.
Does ARM procedure call standard allows volatile optimisation that contradict the C standard?
So I reached ARM toolchain's support group, and according to them the ARM PCS standard is an independent standard that is not bound to the C standard, such that a compiler can choose to comply to one, or both of them. In their own words:
In a way it's not really a contradiction
- the APCS permits a compiler to respect or ignore local volatile
- the C standard requires a compiler to respect local volatile
so a compiler that is compatible with both will respect local volatile.
Armclang has elected to follow the C standard which makes it compatible with both
So if a compiler choose to perform this non C-conforming optimization, it is still ARM PCS conforming implementation, but not a C-conforming compiler.
To conclude, a C-conforming compiler for ARM architecture which implements ARM PCS will never perform this optimization.
Is `volatile` enough to allow the compiler to handle machine registers with side-effects on read?
Is there enough information in volatile to get the compiler to understand that there might be side effects from a read?
Yes.
The C language formal definition of a side effect actually targets this very scenario. C11 5.1.2.3:
Accessing a
volatile
object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, which are changes in the state of
the execution environment.
Regarding what the compile is allowed to optimize, C11 5.2.3.4:
In the abstract machine, all expressions are evaluated as specified by the semantics. An
actual implementation need not evaluate part of an expression if it can deduce that its
value is not used and that no needed side effects are produced (including any caused by calling a function or accessing avolatile
object).
In plain English, this means that any form of access, read or write, to a volatile object, is considered a side effect and a compiler is not allowed to optimize away side effects.
...then my intuition is that these three functions have different behavior
Indeed they have. This is why coding standards such as MISRA-C forbids us to mix volatile
variable access together with other things in the same expression. In the UART scenario, doing so might cause loss of status flags which would be a severe bug.
Robust programs read/write to volatile variables on a single line and do all other necessary arithmetic in separate expressions.
Related Topics
Boost Libraries in Multithreading-Aware Mode
What Is the Purpose of Max_Digits10 and How Is It Different from Digits10
Purpose of Perfect Forwarding for Callable Argument in Invocation Expression
How to Use the Ansi Escape Code for Outputting Colored Text on Console
What Is the Meaning of & in C++
Exporting Static Data in a Dll
Can You Really Have a Function/Method Without a Body But Just a Try/Catch Block
How to Find Out Cl.Exe's Built-In MACros
Clion C++ Can't Read/Open .Txt File in Project Directory
Can't Access Derived Class Method from Pointer of Type Base Class
How to Know Underlying Type of Class Enum
Is There an Non-Short Circuited Logical "And" in C++