So Why Is I = ++I + 1 Well-Defined in C++11

So why is i = ++i + 1 well-defined in C++11?

... or a value computation using the value of the same scalar object ...

The important part is bolded here. The left hand side does not use the value of i for the value computation. What is being computed is a glvalue. Only afterwards (sequenced after), the value of the object is touched and replaced.

Unfortunately this is a very subtle point :)

Why is `i = i++ + 1` undefined behavior in C++11?

It's "undefined behavior," not "unspecified." Undefined means the machine is allowed to do anything including output an empty program, terminate randomly, or explode. Of course, a subtly unexpected value when porting to another platform is a more likely outcome.

Undefined behavior applies to any case where two side effects apply to the same scalar without being sequenced relative to each other. In this case, the side effects happen to be identical (both increment i from its original value before the expression), but by the letter of the standard, they combine to produce UB.

The side effects are unsequenced because aside from ,, ?:, ||, and &&, operators do not define sequencing rules in terms such as C++11 §5.15/2:

If the second expression is evaluated, every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second expression.

The assignment operators do define a special sequencing rule, §5.17/1:

In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

This does not help i = i ++ + 1 because the side effect of i ++ is not part of any value computation.

In C++11, does `i += ++i + 1` exhibit undefined behavior?

There is no clear case for Undefined Behavior here

Sure, an argument leading to UB can be given, as I indicated in the question, and which has been repeated in the answers given so far. However this involves a strict reading of 5.17:7 that is both self-contradictory and in contradiction with explicit statements in 5.17:1 about compound assignment. With a weaker reading of 5.17:7 the contradictions disappear, as does the argument for UB. Whence my conclusion is neither that there is UB here, nor that there is clearly defined behaviour, but the the text of the standard is inconsistent, and should be modified to make clear which reading prevails (and I suppose this means a defect report should be written). Of course one might invoke here the fall-back clause in the standard (the note in 1.3.24) that evaluations for which the standard fails to define the behavior [unambiguously and self-consistently] are Undefined Behavior, but that would make any use of compound assignments (including prefix increment/decrement operators) into UB, something that might appeal to certain implementors, but certainly not to programmers.

Instead of arguing for the given problem, let me present a slightly modified example that brings out the inconsistency more clearly. Assume one has defined

int& f (int& a) { return a; }

a function that does nothing and returns its (lvalue) argument. Now modify the example to

n += f(++n) + 1;

Note that while some extra conditions about sequencing of function calls are given in the standard, this would at first glance not seem to effect the example, since there are no side effect at all from the function call (not even locally inside the function), as the incrementation happens in the argument expression for f, whose evaluation is not subject to those extra conditions. Indeed, let us apply the Crucial Argument for Undefined Behavior (CAUB), namely 5.17:7 which says that the behavior of such a compound assignment is equivalent to that of (in this case)

n = n + f(++n) + 1;

except that n is evaluated only once (an exception that makes no difference here). The evaluation of the statement I just wrote clearly has UB (the value computation of the first (prvalue) n in the RHS is unsequenced w.r.t. the side effect of the ++ operation, which involves the same scalar object (1.9:15) and you're dead).

So the evaluation of n += f(++n) + 1 has undefined behavior, right? Wrong! Read in 5.17:1 that

With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. — end note ]

This language is far from as precise as I would like it to be, but I don't think it is a stretch to assume that "indeterminately-sequenced" should mean "with respect to that operation of a compound assignment". The (non normative, I know) note makes it clear that the lvalue-to-rvalue conversion is part of the operation of the compound assignment. Now is the call of f indeterminately-sequenced with respect to the operation of the compound assignment of +=? I'm unsure, because the 'sequenced' relation is defined for individual value computations and side effects, not complete evaluations of operators, which may involve both. In fact the evaluation of a compound assignment operator involves three items: the lvalue-to-rvalue conversion of its left operand, the side effect (the assignment proper), and the value computation of the compound assignment (which is sequenced after the side effect, and returns the original left operand as lvalue). Note that the existence of the lvalue-to-rvalue conversion is never explicitly mentioned in the standard except in the note cited above; in particular, the standard makes no (other) statement at all regarding its sequencing relative to other evaluations. It is pretty clear that in the example the call of f is sequenced before the side effect and value computation of += (since the call occurs in the value computation of the right operand to +=), but it might be indeterminately-sequenced with respect to the lvalue-to-rvalue conversion part. I recall from my question that since the left operand of += is an lvalue (and necessarily so), one cannot construe the lvalue-to-rvalue conversion to have occurred as part of the value computation of the left operand.

However, by the principle of the excluded middle, the call to f must either be indeterminately-sequenced with respect to the operation of the compound assignment of +=, or not indeterminately-sequenced with respect to it; in the latter case it must be sequenced before it because it cannot possibly be sequenced after it (the call of f being sequenced before the side effect of +=, and the relation being anti-symmetric). So first assume it is indeterminately-sequenced with respect to the operation. Then the cited clause says that w.r.t. the call of f the evaluation of += is a single operation, and the note explains that it means the call should not intervene between the lvalue-to-rvalue conversion and the side effect associated with +=; it should either be sequenced before both, or after both. But being sequenced after the side effect is not possible, so it should be before both. This makes (by transitivity) the side effect of ++ sequenced before the lvalue-to-rvalue conversion, exit UB. Next assume the call of f is sequenced before the operation of +=. Then it is in particular sequenced before the lvalue-to-rvalue conversion, and again by transitivity so is the side effect of ++; no UB in this branch either.

Conclusion: 5.17:1 contradicts 5.17:7 if the latter is taken (CAUB) to be normative for questions of UB resulting from unsequenced evaluations by 1.9:15. As I said CAUB is self-contradictory as well (by arguments indicated in the question), but this answer is getting to long, so I'll leave it at this for now.

Three problems, and two proposals for resolving them

Trying to understand what the standard writes about these matters, I distinguish three aspects in which the text is hard to interpret; they all are of a nature that the text is insufficiently clear about what model its statements are referring to. (I cite the texts at the end of the numbered items, since I do not know the markup to resume a numbered item after a quote)

  1. The text of 5.17:7 is of an apparent simplicity that, although the intention is easy to grasp, gives us little hold when applied to difficult situations. It makes a sweeping claim (equivalent behavior, apparently in all aspects) but whose application is thwarted by the exception clause. What if the behavior of E1 = E1 op E2 is undefined? Well then that of E1 op = E2 should be as well. But what if the UB was due to E1 being evaluated twice in E1 = E1 op E2? Then evaluating E1 op = E2 should presumably not be UB, but if so, then defined as what? This is like saying "the youth of the second twin was exactly like that of the first, except that he did not die at childbirth." Frankly, I think this text, which has little evolved since the C version "A compound assignment of the the form E1 op = E2 differs from the simple assignment expression E1 = E1 op E2 only in that the lvalue E1 is evaluated only once." might be adapted to match the changes in the standard.

    (5.17) 7 The behavior of an expression of the form E1 op = E2 is equivalent to
    E1 = E1 op E2 except that E1 is evaluated only once.[...]

  2. It is not so clear what precisely the actions (evaluations) are between which the 'sequenced' relation is defined. It is said (1.9:12) that evaluation of an expression includes value computations and initiation of side effects. Though this appears to say that an evaluation may have multiple (atomic) components, the sequenced relation is actually mostly defined (e.g. in 1.9:14,15) for individual components, so that it might be better to read this as that the notion of "evaluation" encompasses both value computations and (initiation of) side effects. However in some cases the 'sequenced' relation is defined for the (entire) execution of an expression of statement (1.9:15) or for a function call (5.17:1), even though a passage in 1.9:15 avoids the latter by referring directly to executions in the body of a called function.

    (1.9) 12 Evaluation of an expression (or a sub-expression) in general includes
    both value computations (...) and initiation of side effects. [...] 13 Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread [...] 14 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. [...] 15 When calling a function (whether or not the function is inline), every value computation and side effect
    associated with any argument expression, or with the postfix expression designating the called function, is
    sequenced before execution of every expression or statement in the body of the called function. [...] Every evaluation in the calling function (including other function calls) ... is indeterminately sequenced with
    respect to the execution of the called function [...] (5.2.6, 5.17) 1 ... With respect to an indeterminately-sequenced function call, ...

  3. The text should more clearly acknowledge that a compound assignment involves, in contrast to a simple assignment, the action of fetching the value previously assigned to its left operand; this action is like lvalue-to-rvalue conversion, but does not happen as part of the value computation of that left operand, since it is not a prvalue; indeed it is a problem that 1.9:12 only acknowledges such action for prvalue evaluation. In particular the text should be more clear about which 'sequenced' relations are given for that action, if any.

    (1.9) 12 Evaluation of an expression... includes... value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation)

The second point is the least directly related to our concrete question, and I think it can be solved simply by choosing a clear point of view and reformulating pasages that seem to indicate a different point of view. Given that one of the main purposes of the old sequence points, and now the 'sequenced' relation, was to make clear that the side effect of postfix-increment operators is unsequenced w.r.t. to actions sequenced after the value computation of that operator (thus giving e.g. i = i++ UB), the point of view must be that individual value computations and (initiation of) individual side effects are "evaluations" for which "sequenced before" may be defined. For pragmatic reasons I would also include two more kinds of (trivial) "evaluations": function entry (so that the language of 1.9:15 may be simplified to: "When calling a function..., every value computation and side effect associated with any of its argument expressions, or with the postfix expression designating the called function, is sequenced before entry of that function") and function exit (so that any action in the function body gets by transitivity sequenced before anything that requires the function value; this used to be guaranteed by a sequence point, but the C++11 standard seems to have lost such guarantee; this might make calling a function ending with return i++; potentially UB where this is not intended, and used to be safe). Then one can also be clear about the "indeterminately sequenced" relation of functions calls: for every function call, and every evaluation that is not (directly or indirectly) part of evaluating that call, that evaluation shall be sequenced (either before or after) w.r.t. both entry and exit of that function call, and it shall have the same relation in both cases (so that in particular such external actions cannot be sequenced after function entry but before function exit, as is clearly desirable within a single thread).

Now to resolve points 1. and 3., I can see two paths (each affecting both points), which have different consequences for the defined or not behavior of our example:

Compound assignments with two operands, and three evaluations

Compound operations have thier two usual operands, an lvalue left operand and a prvalue right operand. To settle the unclarity of 3., it is included in 1.9:12 that fetching the value previously assigned to an object also may occur in compound assignments (rather than only for prvalue evaluation). The semantics of compount assignments are defined by changing 5.17:7 to

In a compound assignment op=, the value previously assigned to the object referred to by the left operand is fetched, the operator op is applied with this value as left operand and the right operand of op= as right operand, and the resulting value replaces that of the object referred to by the left operand.

(That gives two evaluations, the fetch and the side effect; a third evaluation is the trivial value computation of the compound operator, sequenced after both other evaluations.)

For clarity, state clearly in 1.9:15 that value computations in operands are sequenced before all value computations associated with the operator (rather than just those for the result of the operator), which ensures that evaluating the lvalue left operand is sequenced before fetching its value (one can hardly imagine otherwise), and also sequences the value computation of the right operand before that fetch, thus excluding UB in our example. While at it, I see no reason not to also sequence value computations in operands before any side effects associated with the operator (as they clearly must); this would make mentioning this explicitly for (compound) assignments in 5.17:1 superfluous. On the other hand do mention there that the value fetching in a compound assignment is sequenced before its side effect.

Compound assignments with three operands, and two evaluations

In order to obtain that the fetch in a compount assignment will be unsequenced with respect to the value computation of the right operand, making our example UB, the clearest way seems to be to give compound operators an implicit third (middle) operand, a prvalue, not represented by a separate expression, but obtained by lvalue-to-rvalue conversion from the left operand (this three-operand nature corresponds to the expanded form of compound assignments, but by obtaining the middle operand from the left operand, it is ensured that the value is fetched from the same object to which the result will be stored, a crucial guarantee that is only vaguely and implicitly given in the current formulation through the "except that E1 is evaluated only once" clause). The difference with the previous solution is that the fetch is now a genuine lvalue-to-rvalue conversion (since the middle operand is a prvalue) and is performed as part of the value computation of the operands to the compound assignment, which makes it naturally unsequenced with the value computation of the right operand. It should be stated somewhere (in a new clause that describes this implicit operand) that the value computation of the left operand is sequenced before this lvalue-to-rvalue conversion (it clearly must). Now 1.9:12 can be left as it is, and in place of 5.17:7 I propose

In a compound assignment op= with left operand a (an lvalue), and midlle and right operands brespectively c (both prvalues), the operator op is applied with b as left operand and c as right operand, and the resulting value replaces that of the object referred to by a.

(That gives one evaluation, the side effect, with as second evaluation the trivial value computation of the compound operator, sequenced after it.)

The still applicable changes to 1.9:15 and 5.17:1 suggested in the previous solution could still apply, but would not give our original example defined behavior. However the modified example at the top of this answer would still have defined behavior, unless the part 5.17:1 "compound assignment is a single operation" is scrapped or modified (there is a similar passage in 5.2.6 for postfix increment/decrement). The existence of those passages would suggest that detaching the fecth and store operations within a single compound assignement or postfix increment/decrement was not the intention of those who wrote the current standard (and by extension making our example UB), but this of course is mere guesswork.

Will i=i++ be newly well-defined in C17?

The passage you highlighted only says that the expressions i++ and i are evaluated before the evaluation of the full expression i = i++. It is still undefined behavior because i is being modified more than once in an expression without a sequence point.

That passage first appeared in C11, so there's no change from that version C17.

Is comparing an underflowed, unsigned integer to -1 well-defined?

size_t r = 0;
r--;
const bool result = (r == -1);

Strictly speaking, the value of result is implementation-defined. In practice, it's almost certain to be true; I'd be surprised if there were an implementation where it's false.

The value of r after r-- is the value of SIZE_MAX, a macro defined in <stddef.h> / <cstddef>.

For the comparison r == -1, the usual arithmetic conversions are performed on both operands. The first step in the usual arithmetic conversions is to apply the integral promotions to both operands.

r is of type size_t, an implementation-defined unsigned integer type. -1 is an expression of type int.

On most systems, size_t is at least as wide as int. On such systems, the integral promotions cause the value of r either to be converted to unsigned int or to keep its existing type (the former can happen if size_t has the same width as int, but a lower conversion rank). Now the left operand (which is unsigned) has at least the rank of the right operand (which is signed). The right operand is converted to the type of the left operand. This conversion yields the same value as r, and so the equality comparison yields true.

That's the "normal" case.

Suppose we have an implementation where size_t is 16 bits (let's say it's a typedef for unsigned short) and int is 32 bits. So SIZE_MAX == 65535 and INT_MAX == 2147483647. Or we could have a 32-bit size_t and a 64-bit int. I doubt that any such implementation exists, but nothing in the standard forbids it (see below).

Now the left side of the comparison has type size_t and value 65535. Since signed int can represent all the values of type size_t, the integral promotions convert the value to 65535 of type int. Both side of the == operator have type int, so the usual arithmetic conversions have nothing to do. The expression is equivalent to 65535 == -1, which is clearly false.

As I mentioned, this kind of thing is unlikely to happen with an expression of type size_t -- but it can easily happen with narrower unsigned types. For example, if r is declared as an unsigned short, or an unsigned char, or even a plain char on a system where that type is signed, the value of result will probably be false. (I say probably because short or even unsigned char can have the same width as int, in which case result will be true.)

In practice, you can avoid the potential problem by doing the conversion explicitly rather than relying on the implementation-defined usual arithmetic conversions:

const bool result = (r == (size_t)-1);

or

const bool result = (r == SIZE_MAX);

C++11 standard references:

  • 5.10 [expr.eq] Equality operators
  • 5.9 [expr.rel] Relational operators (specifies that the usual arithmetic conversions are performed)
  • 5 [expr] Expressions, paragraph 9: Usual arithmetic conversions
  • 4.5 [conv.prom] Integral promotions
  • 18.2 [support.types] size_t

18.2 paragraphs 6-7:

6 The type size_t is an implementation-defined unsigned integer type
that is large enough to contain the size in bytes of any object.

7 [ Note: It is recommended that implementations choose types for
ptrdiff_t and size_t whose integer conversion ranks (4.13) are no
greater than that of signed long int unless a larger size is
necessary to contain all the possible values. — end note ]

So there's no prohibition on making size_t narrower than int. I can almost plausibly imagine a system where int is 64 bits, but no single object can be bigger than 232-1 bytes so size_t is 32 bits.

Is this code well-defined?

This depends on how Sun is defined. The following is well-defined

struct A {
A &Fun(int);
A &Gun(int);
A &Sun(int&);
A &Tun();
};

void g() {
A someInstance;
int k = 0;
someInstance.Fun(++k).Gun(10).Sun(k).Tun();
}

If you change the parameter type of Sun to int, it becomes undefined. Let's draw a tree of the version taking an int.

                     <eval body of Fun>
|
% // pre-call sequence point
|
{ S(increment, k) } <- E(++x)
|
E(Fun(++k).Gun(10))
|
.------+-----. .-- V(k)--%--<eval body of Sun>
/ \ /
E(Fun(++k).Gun(10).Sun(k))
|
.---------+---------.
/ \
E(Fun(++k).Gun(10).Sun(k).Tun())
|
% // full-expression sequence point

As can be seen, we have a read of k (designated by V(k)) and a side-effect on k (at the very top) that are not separated by a sequence point: In this expression, relative to each other sub-expression, there is no sequence point at all. The very bottom % signifies the full-expression sequence point.

Is modifying an object more than once in an expression through its name and through its reference well-defined?

All of them have undefined behavior before C++17 and all have well-defined behavior since C++17.

Note that you are not modifying i more than once in either example. You are modifying it only with the increment.

However, it is also undefined behavior to have a side effect on one scalar (here the increment of i) be unsequenced with a value computation (here the left-hand use of i). Whether the side effect is produced by directly acting on the variable or through a reference or pointer does not matter.

Before C++17, the << operator did not imply any sequencing of its operands, so the behavior is undefined in all your examples.

Since C++17, the << operator is guaranteed to evaluate its operands from left-to-right. C++17 also extended the sequencing rules for operators to overloaded operators when called with the operator notation. So in all your examples the behavior is well-defined and the left-hand use of i is evaluated first, before i's value is incremented.

Note however, that some compilers didn't implement these changes to the evaluation rules very timely, so even if you use the -std=c++17 flag, it might still unfortunately violate the expected behavior with older and current compiler versions.

In addition, at least in the case of GCC, the -Wsequence-point warning is explicitly documented to warn even for behavior that became well-defined in C++17, to help the user to avoid writing code that would have undefined behavior in C and earlier C++ versions, see GCC documentation.

The compiler is not required (and not able to) diagnose all cases of undefined behavior. In some simple situations it will be able to give you a warning (which you can turn into an error using -Werror or similar), but in more complex cases it will not. Still, your program will loose any guarantee on its behavior if you have undefined behavior, whether diagnosed or not.

Is &*NULL well-defined in C?

While attempts to dereference a null pointer cause undefined behavior, so *nullPtr is illegal, &*nullPtr is perfectly well-defined. According to footnote 102 in the C11 Draft Standard:

Thus, &*E is equivalent to E (even if E is a null pointer),....

This is a result of the fact that, for the unary & operator (§6.5.3.2 ¶3):

If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted,....

The C99 Standard has the same language, but this does not appear in the C90 Standard, and my reading of that standard is that &*nullPtr would indeed cause undefined behavior in pre-C99 implementations.

From the C90 Standard (§6.3.2.3):

The result of the unary & (address-of) operator is a pointer to the object or function designated by its operand....

and:

The unary * operator denotes indirection.... If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

Curiously, I don't see any discussion of this change in the C99 Rationale, though I may just be not finding it.



Related Topics



Leave a reply



Submit