In C++11, Does 'I += ++I + 1' Exhibit Undefined Behavior

In C++11, does `i += ++i + 1` exhibit undefined behavior?

There is no clear case for Undefined Behavior here

Sure, an argument leading to UB can be given, as I indicated in the question, and which has been repeated in the answers given so far. However this involves a strict reading of 5.17:7 that is both self-contradictory and in contradiction with explicit statements in 5.17:1 about compound assignment. With a weaker reading of 5.17:7 the contradictions disappear, as does the argument for UB. Whence my conclusion is neither that there is UB here, nor that there is clearly defined behaviour, but the the text of the standard is inconsistent, and should be modified to make clear which reading prevails (and I suppose this means a defect report should be written). Of course one might invoke here the fall-back clause in the standard (the note in 1.3.24) that evaluations for which the standard fails to define the behavior [unambiguously and self-consistently] are Undefined Behavior, but that would make any use of compound assignments (including prefix increment/decrement operators) into UB, something that might appeal to certain implementors, but certainly not to programmers.

Instead of arguing for the given problem, let me present a slightly modified example that brings out the inconsistency more clearly. Assume one has defined

int& f (int& a) { return a; }

a function that does nothing and returns its (lvalue) argument. Now modify the example to

n += f(++n) + 1;

Note that while some extra conditions about sequencing of function calls are given in the standard, this would at first glance not seem to effect the example, since there are no side effect at all from the function call (not even locally inside the function), as the incrementation happens in the argument expression for f, whose evaluation is not subject to those extra conditions. Indeed, let us apply the Crucial Argument for Undefined Behavior (CAUB), namely 5.17:7 which says that the behavior of such a compound assignment is equivalent to that of (in this case)

n = n + f(++n) + 1;

except that n is evaluated only once (an exception that makes no difference here). The evaluation of the statement I just wrote clearly has UB (the value computation of the first (prvalue) n in the RHS is unsequenced w.r.t. the side effect of the ++ operation, which involves the same scalar object (1.9:15) and you're dead).

So the evaluation of n += f(++n) + 1 has undefined behavior, right? Wrong! Read in 5.17:1 that

With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. — end note ]

This language is far from as precise as I would like it to be, but I don't think it is a stretch to assume that "indeterminately-sequenced" should mean "with respect to that operation of a compound assignment". The (non normative, I know) note makes it clear that the lvalue-to-rvalue conversion is part of the operation of the compound assignment. Now is the call of f indeterminately-sequenced with respect to the operation of the compound assignment of +=? I'm unsure, because the 'sequenced' relation is defined for individual value computations and side effects, not complete evaluations of operators, which may involve both. In fact the evaluation of a compound assignment operator involves three items: the lvalue-to-rvalue conversion of its left operand, the side effect (the assignment proper), and the value computation of the compound assignment (which is sequenced after the side effect, and returns the original left operand as lvalue). Note that the existence of the lvalue-to-rvalue conversion is never explicitly mentioned in the standard except in the note cited above; in particular, the standard makes no (other) statement at all regarding its sequencing relative to other evaluations. It is pretty clear that in the example the call of f is sequenced before the side effect and value computation of += (since the call occurs in the value computation of the right operand to +=), but it might be indeterminately-sequenced with respect to the lvalue-to-rvalue conversion part. I recall from my question that since the left operand of += is an lvalue (and necessarily so), one cannot construe the lvalue-to-rvalue conversion to have occurred as part of the value computation of the left operand.

However, by the principle of the excluded middle, the call to f must either be indeterminately-sequenced with respect to the operation of the compound assignment of +=, or not indeterminately-sequenced with respect to it; in the latter case it must be sequenced before it because it cannot possibly be sequenced after it (the call of f being sequenced before the side effect of +=, and the relation being anti-symmetric). So first assume it is indeterminately-sequenced with respect to the operation. Then the cited clause says that w.r.t. the call of f the evaluation of += is a single operation, and the note explains that it means the call should not intervene between the lvalue-to-rvalue conversion and the side effect associated with +=; it should either be sequenced before both, or after both. But being sequenced after the side effect is not possible, so it should be before both. This makes (by transitivity) the side effect of ++ sequenced before the lvalue-to-rvalue conversion, exit UB. Next assume the call of f is sequenced before the operation of +=. Then it is in particular sequenced before the lvalue-to-rvalue conversion, and again by transitivity so is the side effect of ++; no UB in this branch either.

Conclusion: 5.17:1 contradicts 5.17:7 if the latter is taken (CAUB) to be normative for questions of UB resulting from unsequenced evaluations by 1.9:15. As I said CAUB is self-contradictory as well (by arguments indicated in the question), but this answer is getting to long, so I'll leave it at this for now.

Three problems, and two proposals for resolving them

Trying to understand what the standard writes about these matters, I distinguish three aspects in which the text is hard to interpret; they all are of a nature that the text is insufficiently clear about what model its statements are referring to. (I cite the texts at the end of the numbered items, since I do not know the markup to resume a numbered item after a quote)

  1. The text of 5.17:7 is of an apparent simplicity that, although the intention is easy to grasp, gives us little hold when applied to difficult situations. It makes a sweeping claim (equivalent behavior, apparently in all aspects) but whose application is thwarted by the exception clause. What if the behavior of E1 = E1 op E2 is undefined? Well then that of E1 op = E2 should be as well. But what if the UB was due to E1 being evaluated twice in E1 = E1 op E2? Then evaluating E1 op = E2 should presumably not be UB, but if so, then defined as what? This is like saying "the youth of the second twin was exactly like that of the first, except that he did not die at childbirth." Frankly, I think this text, which has little evolved since the C version "A compound assignment of the the form E1 op = E2 differs from the simple assignment expression E1 = E1 op E2 only in that the lvalue E1 is evaluated only once." might be adapted to match the changes in the standard.

    (5.17) 7 The behavior of an expression of the form E1 op = E2 is equivalent to
    E1 = E1 op E2 except that E1 is evaluated only once.[...]

  2. It is not so clear what precisely the actions (evaluations) are between which the 'sequenced' relation is defined. It is said (1.9:12) that evaluation of an expression includes value computations and initiation of side effects. Though this appears to say that an evaluation may have multiple (atomic) components, the sequenced relation is actually mostly defined (e.g. in 1.9:14,15) for individual components, so that it might be better to read this as that the notion of "evaluation" encompasses both value computations and (initiation of) side effects. However in some cases the 'sequenced' relation is defined for the (entire) execution of an expression of statement (1.9:15) or for a function call (5.17:1), even though a passage in 1.9:15 avoids the latter by referring directly to executions in the body of a called function.

    (1.9) 12 Evaluation of an expression (or a sub-expression) in general includes
    both value computations (...) and initiation of side effects. [...] 13 Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread [...] 14 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. [...] 15 When calling a function (whether or not the function is inline), every value computation and side effect
    associated with any argument expression, or with the postfix expression designating the called function, is
    sequenced before execution of every expression or statement in the body of the called function. [...] Every evaluation in the calling function (including other function calls) ... is indeterminately sequenced with
    respect to the execution of the called function [...] (5.2.6, 5.17) 1 ... With respect to an indeterminately-sequenced function call, ...

  3. The text should more clearly acknowledge that a compound assignment involves, in contrast to a simple assignment, the action of fetching the value previously assigned to its left operand; this action is like lvalue-to-rvalue conversion, but does not happen as part of the value computation of that left operand, since it is not a prvalue; indeed it is a problem that 1.9:12 only acknowledges such action for prvalue evaluation. In particular the text should be more clear about which 'sequenced' relations are given for that action, if any.

    (1.9) 12 Evaluation of an expression... includes... value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation)

The second point is the least directly related to our concrete question, and I think it can be solved simply by choosing a clear point of view and reformulating pasages that seem to indicate a different point of view. Given that one of the main purposes of the old sequence points, and now the 'sequenced' relation, was to make clear that the side effect of postfix-increment operators is unsequenced w.r.t. to actions sequenced after the value computation of that operator (thus giving e.g. i = i++ UB), the point of view must be that individual value computations and (initiation of) individual side effects are "evaluations" for which "sequenced before" may be defined. For pragmatic reasons I would also include two more kinds of (trivial) "evaluations": function entry (so that the language of 1.9:15 may be simplified to: "When calling a function..., every value computation and side effect associated with any of its argument expressions, or with the postfix expression designating the called function, is sequenced before entry of that function") and function exit (so that any action in the function body gets by transitivity sequenced before anything that requires the function value; this used to be guaranteed by a sequence point, but the C++11 standard seems to have lost such guarantee; this might make calling a function ending with return i++; potentially UB where this is not intended, and used to be safe). Then one can also be clear about the "indeterminately sequenced" relation of functions calls: for every function call, and every evaluation that is not (directly or indirectly) part of evaluating that call, that evaluation shall be sequenced (either before or after) w.r.t. both entry and exit of that function call, and it shall have the same relation in both cases (so that in particular such external actions cannot be sequenced after function entry but before function exit, as is clearly desirable within a single thread).

Now to resolve points 1. and 3., I can see two paths (each affecting both points), which have different consequences for the defined or not behavior of our example:

Compound assignments with two operands, and three evaluations

Compound operations have thier two usual operands, an lvalue left operand and a prvalue right operand. To settle the unclarity of 3., it is included in 1.9:12 that fetching the value previously assigned to an object also may occur in compound assignments (rather than only for prvalue evaluation). The semantics of compount assignments are defined by changing 5.17:7 to

In a compound assignment op=, the value previously assigned to the object referred to by the left operand is fetched, the operator op is applied with this value as left operand and the right operand of op= as right operand, and the resulting value replaces that of the object referred to by the left operand.

(That gives two evaluations, the fetch and the side effect; a third evaluation is the trivial value computation of the compound operator, sequenced after both other evaluations.)

For clarity, state clearly in 1.9:15 that value computations in operands are sequenced before all value computations associated with the operator (rather than just those for the result of the operator), which ensures that evaluating the lvalue left operand is sequenced before fetching its value (one can hardly imagine otherwise), and also sequences the value computation of the right operand before that fetch, thus excluding UB in our example. While at it, I see no reason not to also sequence value computations in operands before any side effects associated with the operator (as they clearly must); this would make mentioning this explicitly for (compound) assignments in 5.17:1 superfluous. On the other hand do mention there that the value fetching in a compound assignment is sequenced before its side effect.

Compound assignments with three operands, and two evaluations

In order to obtain that the fetch in a compount assignment will be unsequenced with respect to the value computation of the right operand, making our example UB, the clearest way seems to be to give compound operators an implicit third (middle) operand, a prvalue, not represented by a separate expression, but obtained by lvalue-to-rvalue conversion from the left operand (this three-operand nature corresponds to the expanded form of compound assignments, but by obtaining the middle operand from the left operand, it is ensured that the value is fetched from the same object to which the result will be stored, a crucial guarantee that is only vaguely and implicitly given in the current formulation through the "except that E1 is evaluated only once" clause). The difference with the previous solution is that the fetch is now a genuine lvalue-to-rvalue conversion (since the middle operand is a prvalue) and is performed as part of the value computation of the operands to the compound assignment, which makes it naturally unsequenced with the value computation of the right operand. It should be stated somewhere (in a new clause that describes this implicit operand) that the value computation of the left operand is sequenced before this lvalue-to-rvalue conversion (it clearly must). Now 1.9:12 can be left as it is, and in place of 5.17:7 I propose

In a compound assignment op= with left operand a (an lvalue), and midlle and right operands brespectively c (both prvalues), the operator op is applied with b as left operand and c as right operand, and the resulting value replaces that of the object referred to by a.

(That gives one evaluation, the side effect, with as second evaluation the trivial value computation of the compound operator, sequenced after it.)

The still applicable changes to 1.9:15 and 5.17:1 suggested in the previous solution could still apply, but would not give our original example defined behavior. However the modified example at the top of this answer would still have defined behavior, unless the part 5.17:1 "compound assignment is a single operation" is scrapped or modified (there is a similar passage in 5.2.6 for postfix increment/decrement). The existence of those passages would suggest that detaching the fecth and store operations within a single compound assignement or postfix increment/decrement was not the intention of those who wrote the current standard (and by extension making our example UB), but this of course is mere guesswork.

Why is `i = i++ + 1` undefined behavior in C++11?

It's "undefined behavior," not "unspecified." Undefined means the machine is allowed to do anything including output an empty program, terminate randomly, or explode. Of course, a subtly unexpected value when porting to another platform is a more likely outcome.

Undefined behavior applies to any case where two side effects apply to the same scalar without being sequenced relative to each other. In this case, the side effects happen to be identical (both increment i from its original value before the expression), but by the letter of the standard, they combine to produce UB.

The side effects are unsequenced because aside from ,, ?:, ||, and &&, operators do not define sequencing rules in terms such as C++11 §5.15/2:

If the second expression is evaluated, every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second expression.

The assignment operators do define a special sequencing rule, §5.17/1:

In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

This does not help i = i ++ + 1 because the side effect of i ++ is not part of any value computation.

c++11 Order of evaluation (undefined behavior)

Yes, your first example has undefined behavior because we have an unsequenced modification and access of a memory location, which is undefined behavior. This is covered in draft C++ standard [intro.execution]p10 (emphasis mine):

Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced
.
[ Note: In an expression that is evaluated more than once during the
execution of a program, unsequenced and indeterminately sequenced
evaluations of its subexpressions need not be performed consistently
in different evaluations. — end note  ] The value computations of the
operands of an operator are sequenced before the value computation of
the result of the operator. If a side effect on a memory location
([intro.memory]) is unsequenced relative to either another side effect
on the same memory location or a value computation using the value of
any object in the same memory location, and they are not potentially
concurrent ([intro.multithread]), the behavior is undefined
. [ Note:
The next subclause imposes similar, but more complex restrictions on
potentially concurrent computations. — end note  ] [ Example:

void g(int i) {
i = 7, i++, i++; // i becomes 9

i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
i = i + 1; // the value of i is incremented
}

— end example  ]

If we check out the section of relational operators which covers <= [expr.rel] it does not specify an order of evaluation, so we are covered by intro.exection and thus we have undefined behavior.

Having unspecified order of evaluation is not sufficient for undefined behavior as the example in Order of evaluation of assignment statement in C++ demonstrates.

Your second example avoids the undefined behavior since you are not introducing a side effect to ival, you are just reading the memory location twice.

I believe that is a reasonable way to solve the problem, it is readable and not surprising. An alternate method could include introducing a second variable, e.g. index and prev_index. It is hard to come with a fast rule given such a small code snippet.

If a part of the program exhibits undefined behavior, would it affect the remainder of the program?

Once a statement with undefined behaviour is reached, then the behaviour of the entire program is undefined.

Paradoxically, the behaviour of statements that have ran prior to that are undefined too.

With C++11, is it undefined behavior to write f(x++), g(x++)?

No, the behavior is defined. To quote C++11 (n3337) [expr.comma/1]:

A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded-value expression (Clause [expr]).
Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression.

And I take "every" to mean "every"1. The evaluation of the second x++ cannot happen before the call sequence to f is completed and f returns.2



1 Destructor calls aren't associated with sub-expressions, only with full expressions. So you'll see those executed in reverse order to temporary object creation at the end of the full expression.

2 This paragraph only applies to the comma when used as an operator. When the comma has a special meaning (such when designating a function call argument sequence) this does not apply.

So why is i = ++i + 1 well-defined in C++11?

... or a value computation using the value of the same scalar object ...

The important part is bolded here. The left hand side does not use the value of i for the value computation. What is being computed is a glvalue. Only afterwards (sequenced after), the value of the object is touched and replaced.

Unfortunately this is a very subtle point :)



Related Topics



Leave a reply



Submit