Undefined Behavior Causing Time Travel

Undefined behavior causing time travel

There is a flow in the reasoning.

When a compiler writer says: we use Undefined Behavior to optimize a program, there are two different interpretations:

  • most people hear: we identify Undefined Behavior and decide we can do whatever we want (*)
  • the compiler writer meant: we assume Undefined Behavior does not occur

Thus, in your case:

  • dereferencing a nullptr is Undefined Behavior
  • thus executing value_or_fallback(nullptr) is Undefined Behavior
  • thus executing the else branch is Undefined Behavior
  • thus door_is_open being false is Undefined Behavior

And since Undefined Behavior does not occur (the programmer swears she will follow the terms of use), door_is_open is necessarily true and the compiler can elide the else branch.

(*) I am slightly annoyed that Raymond Chen actually formulated it this way...

Why is the phrase: undefined behavior means the compiler can do anything it wants true?

Nothing "causes" this to occur. Undefined behaviour cannot "occur". There is no mystical force that descends upon your computer and suddenly makes it create black holes inside of cats.

That anything can happen when you run a program whose behaviour is undefined, is stated as fact by the C++ standard. It's a statement of leeway, a handy excuse used by compilers to make assumptions about your code so as to provide useful optimisations.

For example, if we say that dereferencing nullptr is undefined (which it is) then no compiler needs to ever check that a pointer is not nullptr: it can just assume that a dereferenced pointer will never be nullptr, and if it's not then any consequences are the programmer's problem.

Due to the astounding complexity of compilers, some of those consequences can be rather unexpected.

Of course it is not actually true that "anything can happen". Your computer has neither the necessary physical power nor the necessary legal authority to instantiate a black hole inside of a cat. But since C++ is an abstraction, it seems only fitting that we use abstractions to teach people not to write programs with undefined behaviour. If you program rigorously, assuming that "anything can happen" if your program has undefined behaviour, then you will not be surprised by said rather unexpected consequences, and you will not be tempted to try to "control" the outcome in any way.

Does an expression with undefined behaviour that is never actually executed make a program erroneous?

If a side effect on a scalar object is unsequenced relative to etc

Side effects are changes in the state of the execution environment (1.9/12). A change is a change, not an expression that, if evaluated, would potentially produce a change. If there is no change, there is no side effect. If there is no side effect, then no side effect is unsequenced relative to anything else.

This does not mean that any code which is never executed is UB-free (though I'm pretty sure most of it is). Each occurrence of UB in the standard needs to be examined separately. (The stricken-out text is probably overly cautious; see below).

The standard also says that

A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).

(emphasis mine)

This, as far as I can tell, is the only normative reference that says what the phrase "undefined behavior" means: an undefined operation in a program execution. No execution, no UB.

At what point in the loop does integer overflow become undefined behavior?

If you're interested in a purely theoretical answer, the C++ standard allows undefined behaviour to "time travel":

[intro.execution]/5:
A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)

As such, if your program contains undefined behaviour, then the behaviour of your whole program is undefined.

Why doesn't the compiler warn you if there is possible Undefined Behaviour?

The work of compiler is to compile the code from high level language to lower level. If you get a descriptive error or warning message, it is the time to thanks to the compiler that it did extra job for you. For getting the required warning, use some static code analysis tool.

And anything not well defined in the spec is undefined, and it is not possible to prepare a comprehensive list of undefined behaviour. Emitting warning on all such behaviours may not be possible.

Practically, in many cases, compilers do warn about undefined behaviours specially with proper warning flags like -W -Wall -Wextra -O2 on gcc. (with optimization flags like -O2 compiler would do regress analysis of code and may generate more warning)

Is it undefined behaviour to access an array beyond its end, if that area is allocated?

What you are describing is affectionately called "the struct hack". It's not clear if it's completely okay, but it was and is widely used.

As of late (C99), it has started to be replaced by the "flexible array member", where you're allowed to put an int data[]; field if it's the last field in the struct.

Does integer overflow cause undefined behavior because of memory corruption?

You misunderstand the reason for undefined behavior. The reason is not memory corruption around the integer - it will always occupy the same size which integers occupy - but the underlying arithmetics.

Since signed integers are not required to be encoded in 2's complement, there can not be specific guidance on what is going to happen when they overflow. Different encoding or CPU behavior can cause different outcomes of overflow, including, for example, program kills due to traps.

And as with all undefined behavior, even if your hardware uses 2's complement for its arithmetic and has defined rules for overflow, compilers are not bound by them. For example, for a long time GCC optimized away any checks which would only come true in a 2's-complement environment. For instance, if (x > x + 1) f() is going to be removed from optimized code, as signed overflow is undefined behavior, meaning it never happens (from compiler's view, programs never contain code producing undefined behavior), meaning x can never be greater than x + 1.

Is conditionally not modifying constant data undefined behavior in C?

If flag is false then strcpy(ptr, "Hello World"); is not evaluated, and the fact that ptr points to the data of a string literal is irrelevant.

If code on unexecuted paths could cause undefined behavior (due to its evaluation, not due to some grammar constraint that arises during translation), then C would break throughly, as tests for null pointers would not work:

if (p)
Use pointer p to do something.


Related Topics



Leave a reply



Submit