Undefined Behavior, Or: Does Swift Have Sequence Points

Undefined behavior, or: Does Swift have sequence points?

The question was answered by Apple developer and Swift designer Chris
Lattner in the Apple Developer Forum https://forums.developer.apple.com/thread/20001#63783:

Yes, the result of that expression will always be 4. Swift evaluates
expressions left to right, it isn't undefined or implementation
defined behavior like C.

Chris also added:

That said, if you write code like that, someone trying to maintain it
will probably not be very happy with you

Agreed! It was meant as an extreme example to demonstrate the problem.

Sequence points when calling functions in C and undefined/unspecified behaviour

No, per 6.5.2.2 10 there is no sequence point between the evaluation of subexpression arguments, just before the actual call.

One way of looking at it is that it is unspecified whether the behaviour is undefined; if the implementation sequences the two ++i subexpressions before any call to g or h then the behaviour is undefined, but if the ++i subexpressions are evaluated as late as possible (immediately before calling g and h respectively) then the behaviour is unspecified. However, because the implementation is always at liberty to choose between any allowed unspecified behaviour then the overall result is undefined.

Is one side of an assignment sequenced before the other in c++?


a[i] = foo();

Here it is unspecified whether foo or a[i] is evaluted first. In the new C++11 wording, the two evaluations are unsequenced. That alone doesn't cause undefined behaviour, though. It is when there are two unsequenced accesses to the same scalar object, at least one of which is writing, where it does. That's why a[i] = i++; is UB.

The difference between these two statements is that a call to foo() does introduce a sequence point. C++11 wording is different: executions inside a called function are indeterminately sequenced with respect to other evaluations inside the calling function.

This means there's a partial ordering between a[i] and i++ inside foo. As a result, either a[0] or a[1] will get set to 0, but the program is well defined.

Preincrement vs postincrement in terms of sequence points

What happens in the expression i = i++ + 1? Is it well-defined, undefined, implementation defined or unspecified behaviour?

This exact example is given in the standard, how lucky are we?

N4296 1.9.15 [intro.execution]

i = i++ + 1; // the behavior is undefined

Of course, we'd like to know why too. The following standard quote appears to be relevant here:

N4296 1.9.15 [intro.execution]

[ ... ] The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator. [ ... ]

This tells us that the sum will occur before the assignment (duh, how else does it know what to assign!), but it doesn't guarantee that the increment will occur before or after the assignment, now we're in murky water...

N4296 1.9.15 [intro.execution]

[ ... ] If a side effect on a scalar object is unsequenced relative to either
another side effect on the same scalar object or a value computation
using the value of the same scalar object, and they are not
potentially concurrent (1.10), the behavior is undefined. [ ... ]

The assignment operator has a side effect on the value of i, which means we have two side effects (the other is the assignment performed by i++) on the same scalar object, which are unsequenced, which is undefined.

Why does Visual Studio show the behavior which is different from written in standard?

It doesn't. The standard says it's undefined, which means it can do anything from what you wanted to something completely different, it just so happens that this is the behaviour that got spat out by the compiler!

Sequence points in a language with left to right evaluation order?

A sequence point is what the language standard defines to be a sequence point. The answers I'm about to give apply to C, but another "C-like" language might very well define different sequence points and thus have different answers to those questions.

int z = x-- + x; // z = 1 + 0 or z = 1 + 1?

Since + is not a sequence point in C, the result of the above statement is undefined.

my_func(x++); // x incremented before or after my_func execution?

x is incremented before my_func runs, but my_func is called with the old value of x as an argument.

my_func(x++ + --x); // combining those above

Undefined for the same reason as the first one.

Undefined behavior with: c = (b=a+2) - (a=1) ;

1 and 2 are perfectly correct. In the case of order of evaluation of operands, it is unspecified for most operators in C. Meaning that either (b=a+2) or (a=1) can get evaluated first, and you cannot know which order that applies for any given case.

In addition, if a variable is modified between two sequence points, any other access to that variable is not allowed, save for calculating what value to store in it.

C99 states this in 6.5 (emphasis mine):

Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value to be stored.

So code like a = a+1 is perfectly well-defined, while code like a = a++ leads to undefined behavior.

It all boils down to the "abstract machine" which is the rules determining the execution order of your program. Writing a value to a variable is a side effect and the C standard states that all side effects must have occurred before the next sequence point. Now if you have several side effects related to the same variable, there are no guarantees in which order they will be sequenced in relation to each other, until the next sequence point is reached.

The practical advise to avoid bugs caused by sequencing and order of evaluation, is to keep expressions simple, with as few operators and as few side effects on each line as possible. In the case of your original example, a better way to write the code would be:

b = a + 2;
a = 1;
c = b - a;

The above code cannot be misinterpreted neither by the compiler nor by the human reader.


Just for the record, C11 has different text, but the very same meaning:

If a side effect on a scalar object is unsequenced relative to either
a different side effect on the same scalar object or a value
computation using the value of the same scalar object, the behavior is
undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an
unsequenced side effect occurs in any of the orderings.

Sequence Points vs Operator Precedence

The first answer in the question you linked to explains exactly what's going on. I'll try to rephrase it to make it more clear.

Operator precedence defines the order of the computation of values via expressions. The result of the expression (a++) is well understood.

However, the modification of the variable a is not part of the expression. Yes, really. This is the part you're having trouble understanding, but that's simply how C and C++ define it.

Expressions result in values, but some expressions can have side effects. The expression a = 1 has a value of 1, but it also has the side effect of setting the variable a to 1. As far as how C and C++ define things, these are two different steps. Similarly, a++ has a value and a side-effect.

Sequence points define when side effects are visible to expressions that are evaluated after those sequence points. Operator precedence has nothing to do with sequence points. That's just how C/C++ defines things.

When undefined behavior can be considered well-known and accepted?

First of all, any compiler implementation is free to define any behavior it likes in any situation which would, from the point of view of the standard, produce Undefined Behavior.

Secondly, code which is written for a particular compiler implementation is free to make use of any behaviors which are documented by that implementation; code which does so, however, may not be usable on other implementations.

One of the longstanding shortcomings of C is that while there are many situations where constructs which could Undefined Behavior on some implementations are handled usefully by others, only a tiny minority of such situations provide any means by which code can specify that a compiler which won't handle them a certain way should refuse compilation. Further, there are many cases in which the Standards Committee allows full-on UB even though on most implementations the "natural" consequences would be much more constrained. Consider, for example (assume int is 32 bits)

int weird(uint16_t x, int64_t y, int64_t z)
{
int r=0;
if (y > 0) return 1;
if (z < 0x80000000L) return 2;
if (x > 50000) r |= 31;
if (x*x > z) r |= 8;
if (x*x < y) r |= 16;
return r;
}

If the above code was run on a machine that simply ignores integer overflow, passing 50001,0,0x80000000L should result in the code returning 31; passing 50000,0,0x80000000L could result in it returning 0, 8, 16, or 24 depending upon how the code handles the comparison operations. The C standard, however, would allow the code to do anything whatsoever in any of those cases; because of that, some compilers might determine that none of the if statements beyond the first two could ever be true in any situation which hadn't invoked Undefined Behavior, and may thus assume that r is always zero. Note that one of the inferences would affect the behavior of a statement which precedes the Undefined Behavior.

One thing I'd really like to see would be a concept of "Implementation Constrained" behavior, which would be something of a cross between Undefined Behavior and Implementation-Defined Behavior: compilers would be required to document all possible consequences of certain constructs which under the old rules would be Undefined Behavior, but--unlike Implementation-Defined behavior--an implementation would not be required to specify one specific thing that would happen; implementations would be allowed to specify that a certain construct may have arbitrary unconstrained consequences (full UB) but would be discouraged from doing so. In the case of something like integer overflow, a reasonable compromise would be to say that the result of an expression that overflows may be a "magic" value which, if explicitly typecast, will yield an arbitrary (and "ordinary") value of the indicated type, but which may otherwise appears to have arbitrarily changing values which may or may not be representable. Compilers would be allowed to assume that the result of an operation will not be a result of overflow, but would refrain from making inferences about the operands. To use a vague analogy, the behavior would be similar to how floating-point would be if explicitly typecasting a NaN could yield any arbitrary non-NaN result.

IMHO, C would greatly benefit from combining the above concept of "implementation-constrained" behaviors with some standard predefined macros which would allow code to test whether an implementation makes any particular promises about its behavior in various situations. Additionally, it would be helpful if there were a standard means by which a section of code could request a particular "dialects" [combination of int size, implementation-constrained behaviors, etc.]. It would be possible to write a compiler for any platform which could, upon request, have promotion rules behave as though int was exactly 32 bits. For example, given code like:

uint64_t l1,l2; uint32_t w1,w2; uint16_t h1,h2;
...
l1+=(h1+h2);
l2+=(w2-w1);

A 16-bit compiler might be fastest if it performed the math on h1 and h2 using 16 bits, and a 64-bit compiler might be fastest if it added to l2 the 64-bit result of subtracting w1 from w2, but if the code was written for a 32-bit system, being able to have compilers for the other two systems generate code which would behave as it had on the 32-bit system would be more helpful than having them generate code which performed some different computation, no matter how much faster the latter code would be.

Unfortunately, there is not at present any standard means by which code can ask for such semantics [a fact which will likely limit the efficiency of 64-bit code in many cases]; the best one can do is probably to expressly document the code's environmental requirements somewhere and hope that whoever is using the code sees them.



Related Topics



Leave a reply



Submit