Undefined Behavior and Sequence Points Reloaded

Undefined behavior and sequence points reloaded

It looks like the code

i.operator+=(i.operator ++());

Works perfectly fine with regards to sequence points. Section 1.9.17 of the C++ ISO standard says this about sequence points and function evaluation:

When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body. There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function.

This would indicate, for example, that the i.operator ++() as the parameter to operator += has a sequence point after its evaluation. In short, because overloaded operators are functions, the normal sequencing rules apply.

Great question, by the way! I really like how you're forcing me to understand all the nuances of a language that I already thought I knew (and thought that I thought that I knew). :-)

Undefined behavior and sequence points

C++98 and C++03

This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.


Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.

Pre-requisites : An elementary knowledge of C++ Standard



What are Sequence Points?

The Standard says

At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)

Side effects? What are side effects?

Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).

For example:

int x = y++; //where y is also an int

In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.

So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:

Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.



What are the common sequence points listed in the C++ Standard?

Those are:

  • at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1

    Example :

    int a = 5; // ; is a sequence point here
  • in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2

    • a && b (§5.14)
    • a || b (§5.15)
    • a ? b : c (§5.16)
    • a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
  • at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
    takes place before execution of any expressions or statements in the function body (§1.9/17).

1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument

2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.



What is Undefined Behaviour?

The Standard defines Undefined Behaviour in Section §1.3.12 as

behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.

Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.

3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.



What is the relation between Undefined Behaviour and Sequence Points?

Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.

You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.

For example:

int x = 5, y = 6;

int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.

Another example here.


Now the Standard in §5/4 says


    1. Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.

What does it mean?

Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.

From the above sentence the following expressions invoke Undefined Behaviour:

i++ * ++i;   // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))

i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)

But the following expressions are fine:

i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined




    1. Furthermore, the prior value shall be accessed only to determine the value to be stored.

What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.

For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.

This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.

Example 1:

std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2

Example 2:

a[i] = i++ // or a[++i] = i or a[i++] = ++i etc

is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.

Example 3 :

int x = i + i++ ;// Similar to above

Follow up answer for C++11 here.

Sequence Points and Method Chaining reloaded

The internal order of execution within an expression is not defined. Only the obvious behavior of operator precedence is defined.

So, in this case, the compiler is obliged to call stream.seek(4) twice [unless the compiler figures out that it's "the same result either way"] and stream.read_integer() twice. But the order of those calls is undetermined (or whatever the term is in the C++ standard) - in other words, the compiler can order those four calls any way it likes.

Your code would be even more risky if you did something like:

 int x
= stream.seek(4).read_integer()
- stream.read_integer();

since it's not well defined which of the two reads happen in which order now - it could call the second read_integer first (at offset 0) or after the seek and read at offset 8. Nobody knows which, and the compiler may even re-arrange them if you make subtle changes to the code (e.g. it decides to do things in a diffferent order because you added another variable that uses another register -> re-arrange code to use registers better...)

The solution is to introduce intermediate variables:

int a = stream.seek(4).read_integer();
int b = stream.seek(4).read_integer();

int should_be_zero = a - b; // Or b - a, if that's what you want... :)

This should be done in every piece of code where the order of execution is important for the correctness of the code - and bear in mind that "side-effects" (such as reading input, writing output, modifying state) are definitely dependent on order of execution.

Are there sequence points in the expression a^=b^=a^=b, or is it undefined?

a ^= b ^= a ^= b; /*Here*/

It is undefined behavior.

You are modifying an object (a) more than once between two sequence points.

(C99, 6.5p2) "Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.

Simple assignments as well as compound assignments don't introduce a sequence point. Here there is a sequence point before the expression statement expression and after the expression statement.

Sequence points are listed in Annex C (informative) of the c99 and c11 Standard.

Sequence points and side effects in C

Finally I got an explanation on SO about this point . After reading it and FAQ I concluded that;

1.The last sentence

Furthermore, the prior value shall be accessed only to determine the value to be stored

would be like this;

Furthermore, the prior value of an object shall be accessed only to determine the modified/new value( of same object ) to be stored.

As it is clear by the example

 int i = 1, j, a[5];    
i = i + 1;
j = i + 1;
a[i] = i;

in case of expression i = i + 1 prior value (which is 1 here) of i (in R.H.S) is accessed to determine the value of i to be stored and this is what the statement

if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written
.

says.

While in case of j = i + 1 and a[i] = i, the accessed value of i is just value not prior value as no where i is modified in these statements.

2.The second question can be explained as;

In case of expression a[i] = i++ or a[i++] = i, first sentence of above statement

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression.

get failed as i is modified only once between two consecutive sequence point. And that's why we need second sentence.

Both of these examples are disallowed in C because the prior value of i accessed two times i.e, i++ itself access prior value of i in the expression to modify it and hence other access of prior value / value of i is needless as it is not accessed to determine the modified value to be stored.

Does this C code result in Undefined Behavior?

There is a gigantic difference between the expressions

b++ + ++c - --d - e--

(which is fine), and

x++ + ++x - --x - x--

(which is rampantly undefined).

It's not using ++ or -- that makes an expression undefined. It's not even using ++ or -- twice in the same expression. No, the problem is when you use ++ or -- to modify a variable inside an expression, and you also try to use the value of that same variable elsewhere in the same expression, and without an intervening sequence point.

Consider the simpler expression

++z + z;

Now, obviously the subexpression ++z will increment z. So the question is, does the + z part use the old or the new value of z? And the answer is that there is no answer, which is why this expression is undefined.

Remember, expressions like ++z do not just mean, "take z's value and add 1". They mean, "take z's value and add 1, and store the result back into z". These expressions have side effects. And the side effects are at the root of the undefinedness issue.

Differences in C and C++ with sequence points and UB

There are two parts to this question, we can tackle a comparison of sequence points rules without much trouble. This does not get us too far though, C and C++ are different languages which have different standards(the latest C++ standard is almost twice as large as the the latest C standard) and even though C++ uses C as a normative reference it would be incorrect to quote the C++ standard for C and vice versa, regardless how similar certain sections may be. The C++ standard does explicitly reference the C standard but that is for small sections.

The second part is a comparison of undefined behavior between C and C++, there can be some big differences and enumerating all the differences in undefined behavior may not be possible but we can give some indicative examples.

Sequence Points

Since we are talking about sequence points then this is covering pre C++11 and pre C11. The sequence point rules do not differ greatly as far as I can tell between C99 and Pre C++11 draft standards. As we will see in some of the example I give of differing undefined behavior the sequence point rules do not play a part in them.

The sequence points rules are covered in the closest draft C++ standard to C++03 section 1.9 Program execution which says:

  • There is a sequence point at the completion of evaluation of each full-expression12).
  • When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all
    function arguments (if any) which takes place before execution of any expressions or statements in the function body.
  • There is also a sequence point after the copying of a returned value and before the execution of any expressions outside
    the function13). Several contexts in C++ cause evaluation of a function call, even though no corresponding function call
    syntax appears in the translation unit. [ Example: evaluation of a new expression invokes one or more allocation and
    constructor functions; see 5.3.4. For another example, invocation of a conversion function (12.3.2) can arise in contexts
    in which no function call syntax appears. —end example ] The sequence points at function-entry and function-exit
    (as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the
    function might be.
  • In the evaluation of each of the expressions

    a && b
    a || b
    a ? b : c
    a , b

    using the built-in meaning of the operators in these expressions (5.14, 5.15, 5.16, 5.18), there is a sequence point after
    the evaluation of the first expression14).

I will use the sequence point list from the draft C99 standard Annex C which although it is not normative I can find no disagreement with the normative sections it references. It says:

The following are the sequence points described in 5.1.2.3:

  • The call to a function, after the arguments have been evaluated (6.5.2.2).
  • The end of the first operand of the following operators: logical AND && (6.5.13);
    logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
  • The end of a full declarator: declarators (6.7.5);
  • The end of a full expression: an initializer (6.7.8); the expression in an expression
    statement (6.8.3); the controlling expression of a selection statement (if or switch)
    (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
    expressions of a for statement (6.8.5.3); the expression in a return statement
    (6.8.6.4).

The following entries do not seem to have equivalents in the draft C++ standard but these come from the C standard library which C++ incorporates by reference:

  • Immediately before a library function returns (7.1.4).
  • After the actions associated with each formatted input/output function conversion
    specifier (7.19.6, 7.24.2).
  • Immediately before and immediately after each call to a comparison function, and
    also between any call to a comparison function and any movement of the objects
    passed as arguments to that call (7.20.5).

So there is not much of a difference between C and C++ here.

Undefined Behavior

When it comes to the typical examples of sequence points and undefined behavior, for example those covered in Section 5 Expression dealing with modifying a variable more than once within a sequence points I can not come up with an example that is undefined in one but not the other. In C99 it says:

Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.72) Furthermore, the prior value shall be read only to
determine the value to be stored.73)

and it provides these examples:

i = ++i + 1;
a[i++] = i;

and in C++ it says:

Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.57) Between the
previous and next sequence point a scalar object shall have its stored
value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined

and provides these examples:

i = v[i ++]; / / the behavior is undefined
i = ++ i + 1; / / the behavior is undefined

In C++11 and C11 we do have one major difference which is covered in Assignment operator sequencing in C11 expressions which is the following:

i = ++i + 1;

This is due to the result of pre-increment being an lvalue in C++11 but not in C11 even though the sequencing rules are the same.

We do have major difference in areas that have nothing to do with sequence points:

  • In C what uses of an indeterminate value is undefined has always been well specified while in C++ it was not until the recent draft C++1y standard that it has been well specified. This is covered in my answer to Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++1y?
  • Type punning through a union has always been well defined in C but not in C++ or at least it is hotly debatable whether it is undefined behavior or not. I have several references to this in my answer to Why does optimisation kill this function?
  • In C++ simply falling off the end of value returning function is undefined behavior while in C it is only undefined behavior if you use the value.

There are probably plenty more examples but these are ones I have written about before.

Assignment and sequence points: how is this ambiguous?

The rules of undefined behavior for sequence point violations do not make an exception for situations when "the value cannot change". Nobody cares whether the value changes or not. What matters is that when you are making any sort of write access to the variable, you are modifying that variable. Even if you are assigning the variable a value that it already holds, you are still performing a modification of that variable. And if multiple modifications are not separated by sequence points, the behavior is undefined.

One can probably argue that such "non-modifying modifications" should not cause any problems. But the language specification does not concern itself with such details. In language terminology, again, every time you are writing something into a variable, you are modifying it.

Moreover, the fact that you use the word "ambiguous" in your question seems to imply that you believe the behavior is unspecified. I.e. as in "the resultant value of the variable is (or isn't) ambiguous". However, in sequence point violations the language specification does not restrict itself to stating that the result is unspecified. It goes much further and declares the behavior undefined. This means that the rationale behind these rules takes into consideration more than just an unpredictable final value of some variable. For example, on some imaginary hardware platform non-sequenced modification might result in invalid code being generated by the compiler, or something like that.

Preincrement vs postincrement in terms of sequence points

What happens in the expression i = i++ + 1? Is it well-defined, undefined, implementation defined or unspecified behaviour?

This exact example is given in the standard, how lucky are we?

N4296 1.9.15 [intro.execution]

i = i++ + 1; // the behavior is undefined

Of course, we'd like to know why too. The following standard quote appears to be relevant here:

N4296 1.9.15 [intro.execution]

[ ... ] The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator. [ ... ]

This tells us that the sum will occur before the assignment (duh, how else does it know what to assign!), but it doesn't guarantee that the increment will occur before or after the assignment, now we're in murky water...

N4296 1.9.15 [intro.execution]

[ ... ] If a side effect on a scalar object is unsequenced relative to either
another side effect on the same scalar object or a value computation
using the value of the same scalar object, and they are not
potentially concurrent (1.10), the behavior is undefined. [ ... ]

The assignment operator has a side effect on the value of i, which means we have two side effects (the other is the assignment performed by i++) on the same scalar object, which are unsequenced, which is undefined.

Why does Visual Studio show the behavior which is different from written in standard?

It doesn't. The standard says it's undefined, which means it can do anything from what you wanted to something completely different, it just so happens that this is the behaviour that got spat out by the compiler!



Related Topics



Leave a reply



Submit