What Kinds of Optimizations Does 'Volatile' Prevent in C++

What kinds of optimizations does 'volatile' prevent in C++?

Basically, volatile announces that a value might change behind your program's back. That prevents compilers from caching the value (in a CPU register) and from optimizing away accesses to that value when they seem unnecessary from the POV of your program.

What should trigger usage of volatile is when a value changes despite the fact that your program hasn't written to it, and when no other memory barriers (like mutexes as used for multi-threaded programs) are present.

Does volatile guarantee anything at all in portable C code for multi-core systems?

To summarize the problem, it appears (after reading a lot) that
"volatile" guarantees something like: The value will be read/written
not just from/to a register, but at least to the core's L1 cache, in
the same order that the reads/writes appear in the code
.

No, it absolutely does not. And that makes volatile almost useless for the purpose of MT safe code.

If it did, then volatile would be quite good for variables shared by multiple thread as ordering the events in the L1 cache is all you need to do in typical CPU (that is either multi-core or multi-CPU on motherboard) capable of cooperating in a way that makes a normal implementation of either C/C++ or Java multithreading possible with typical expected costs (that is, not a huge cost on most atomic or non-contented mutex operations).

But volatile does not provide any guaranteed ordering (or "memory visibility") in the cache either in theory or in practice.

(Note: the following is based on sound interpretation of the standard documents, the standard's intent, historical practice, and a deep understand of the expectations of compiler writers. This approach based on history, actual practices, and expectations and understanding of real persons in the real world, which is much stronger and more reliable than parsing the words of a document that is not known to be stellar specification writing and which has been revised many times.)

In practice, volatile does guarantees ptrace-ability that is the ability to use debug information for the running program, at any level of optimization, and the fact the debug information makes sense for these volatile objects:

  • you may use ptrace (a ptrace-like mechanism) to set meaningful break points at the sequence points after operations involving volatile objects: you can really break at exactly these points (note that this works only if you are willing to set many break points as any C/C++ statement may be compiled to many different assembly start and end points, as in a massively unrolled loop);
  • while a thread of execution of stopped, you may read the value of all volatile objects, as they have their canonical representation (following the ABI for their respective type); a non volatile local variable could have an atypical representation, f.ex. a shifted representation: a variable used for indexing an array might be multiplied by the size of individual objects, for easier indexing; or it might be replaced by a pointer to an array element (as long as all uses of the variable as similarly converted) (think changing dx to du in an integral);
  • you can also modify those objects (as long as the memory mappings allow that, as volatile object with static lifetime that are const qualified might be in a memory range mapped read only).

Volatile guarantee in practice a little more than the strict ptrace interpretation: it also guarantees that volatile automatic variables have an address on the stack, as they aren't allocated to a register, a register allocation which would make ptrace manipulations more delicate (compiler can output debug information to explain how variables are allocated to registers, but reading and changing register state is slightly more involved than accessing memory addresses).

Note that full program debug-ability, that is considering all variables volatile at least at sequence points, is provided by the "zero optimization" mode of the compiler, a mode which still performs trivial optimizations like arithmetic simplifications (there is usually no guaranteed no optimization at all mode). But volatile is stronger than non optimization: x-x can be simplified for a non volatile integer x but not of a volatile object.

So volatile means guaranteed to be compiled as is, like the translation from source to binary/assembly by the compiler of a system call isn't a reinterpretation, changed, or optimized in any way by a compiler. Note that library calls may or may not be system calls. Many official system functions are actually library function that offer a thin layer of interposition and generally defer to the kernel at the end. (In particular getpid doesn't need to go to the kernel and could well read a memory location provided by the OS containing the information.)

Volatile interactions are interactions with the outside world of the real machine, which must follow the "abstract machine". They aren't internal interactions of program parts with other program parts. The compiler can only reason about what it knows, that is the internal program parts.

The code generation for a volatile access should follow the most natural interaction with that memory location: it should be unsurprising. That means that some volatile accesses are expected to be atomic: if the natural way to read or write the representation of a long on the architecture is atomic, then it's expected that a read or write of a volatile long will be atomic, as the compiler should not generate silly inefficient code to access volatile objects byte by byte, for example.

You should be able to determine that by knowing the architecture. You don't have to know anything about the compiler, as volatile means that the compiler should be transparent.

But volatile does no more than force the emission of expected assembly for the least optimized for particular cases to do a memory operation: volatile semantics means general case semantic.

The general case is what the compiler does when it doesn't have any information about a construct: f.ex. calling a virtual function on an lvalue via dynamic dispatch is a general case, making a direct call to the overrider after determining at compile time the type of the object designated by the expression is a particular case. The compiler always have a general case handling of all constructs, and it follows the ABI.

Volatile does nothing special to synchronize threads or provide "memory visibility": volatile only provides guarantees at the abstract level seen from inside a thread executing or stopped, that is the inside of a CPU core:

  • volatile says nothing about which memory operations reach main RAM (you may set specific memory caching types with assembly instructions or system calls to obtain these guarantees);
  • volatile doesn't provide any guarantee about when memory operations will be committed to any level of cache (not even L1).

Only the second point means volatile is not useful in most inter threads communication problems; the first point is essentially irrelevant in any programming problem that doesn't involve communication with hardware components outside the CPU(s) but still on the memory bus.

The property of volatile providing guaranteed behavior from the point of the view of the core running the thread means that asynchronous signals delivered to that thread, which are run from the point of view of the execution ordering of that thread, see operations in source code order.

Unless you plan to send signals to your threads (an extremely useful approach to consolidation of information about currently running threads with no previously agreed point of stopping), volatile is not for you.

When can a volatile variable be optimized away completely?

Writes to volatile variables (even automatic ones) count as observable behaviour.

C11 (N1570) 5.1.2.3/6:

The least requirements on a conforming implementation are:

— Accesses to volatile objects are evaluated strictly according to the rules of the abstract
machine.

— At program termination, all data written into files shall be identical to the result that
execution of the program according to the abstract semantics would have produced.

— The input and output dynamics of interactive devices shall take place as specified in
7.21.3. The intent of these requirements is that unbuffered or line-buffered output
appear as soon as possible, to ensure that prompting messages actually appear prior to
a program waiting for input.

This is the observable behavior of the program.

The question is: does initialization (e, f) count as an "access"? As pointed out by Sander de Dycker, 6.7.3 says:

What constitutes an access to an object that has volatile-qualified type is implementation-defined.

which means it's up to the compiler whether or not e and f can be optimized away - but this must be documented!

Understanding volatile keyword in c++

The volatile keyword in C++ was inherited it from C, where it was intended as a general catch-all to indicate places where a compiler should allow for the possibility that reading or writing an object might have side-effects it doesn't know about. Because the kinds of side-effects that could be induced would vary among different platforms, the Standard leaves the question of what allowances to make up to compiler writers' judgments as to how they should best serve their customers.

Microsoft's compilers for the 8088/8086 and later x86 have for decades been designed to support the practice of using volatile objects to build a mutex which guards "ordinary" objects. As a simple example: if thread 1 does something like:

ordinaryObject = 23;
volatileFlag = 1;
while(volatileFlag)
doOtherStuffWhileWaiting();
useValue(ordinaryObject);

and thread 2 periodically does something like:

if (volatileFlag)
{
ordinaryObject++;
volatileFlag=0;
}

then the accesses to volatileFlag would serve as a warning to Microsoft's compilers that they should refrain from making assumptions about how any preceding actions on any objects would interact with later actions. This pattern has been followed with the volatile qualifiers in other languages like C#.

Unfortunately, neither clang nor gcc includes any option to treat volatile in such a fashion, opting instead to require that programmers use compiler-specific intrinsics to yield the same semantics that Microsoft could achieve using only the Standard keyword volatile that was intended to be suitable for such purposes [according to the authors of the Standard, "A volatile object is also an appropriate model for a variable shared among multiple processes."--see http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf p. 76 ll. 25-26]

How to prevent gcc optimizing some statements in C?

Turning off optimization fixes the problem, but it is unnecessary. A safer alternative is to make it illegal for the compiler to optimize out the store by using the volatile type qualifier.

// Assuming pageptr is unsigned char * already...
unsigned char *pageptr = ...;
((unsigned char volatile *)pageptr)[0] = pageptr[0];

The volatile type qualifier instructs the compiler to be strict about memory stores and loads. One purpose of volatile is to let the compiler know that the memory access has side effects, and therefore must be preserved. In this case, the store has the side effect of causing a page fault, and you want the compiler to preserve the page fault.

This way, the surrounding code can still be optimized, and your code is portable to other compilers which don't understand GCC's #pragma or __attribute__ syntax.

Is 'volatile' sufficient to prevent C++ compilers from optimizing out a silent write?

The standard (§1.9/8) requires that:

The least requirements on a conforming implementation are:

— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

"Access" is defined as either reading or writing, so yes, your *vaddr = *vaddr; must read a value from *vaddr, and then write the same value back to *vaddr.

Does the -O0 compiler flag have the same effect as the volatile keyword in C?

The short answer is: the volatile keyword does not mean "do not optimize". It is something completely different. It informs the compiler that the variable may be changed by something which is not visible for the compiler in the normal program flow. For example:

  1. It can be changed by the hardware - usually registers mapped in the memory address space
  2. Can be changed by the function which is never called - for example the interrupt routine
  3. Variable can be changed by another process or hardware - for example shared memory in the multiprocessor / multicore systems

The volatile variable has to be read from its storage location every time it is used, and saved every time it was changed.

Here you have an example:

int foo(volatile int z)
{
return z + z + z + z;
}

int foo1(int z)
{
return z + z + z + z;
}

and the resulting code (-O0 optimization option)

foo(int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-4]
add edx, eax
mov eax, DWORD PTR [rbp-4]
add edx, eax
mov eax, DWORD PTR [rbp-4]
add eax, edx
pop rbp
ret
foo1(int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
sal eax, 2
pop rbp
ret

The difference is obvious I think. The volatile variable is read 4 times, non volatile is read once, then multiplied by 4.

You can play yourself here: https://godbolt.org/g/RiTU4g

In the most cases if the program does not run when you turn on the compiler optimization, you have some hidden UBs in your code. You should debug as long as needed to discover all of them. The correctly written program must run at any optimization level.

Bear in mind that `volatile' does not mean or guarantee the coherency & atomicity.



Related Topics



Leave a reply



Submit