Why Is 'I = ++I + 1' Unspecified Behavior

Why is `i = ++i + 1` unspecified behavior?

You make the mistake of thinking of operator= as a two-argument function, where the side effects of the arguments must be completely evaluated before the function begins. If that were the case, then the expression i = ++i + 1 would have multiple sequence points, and ++i would be fully evaluated before the assignment began. That's not the case, though. What's being evaluated in the intrinsic assignment operator, not a user-defined operator. There's only one sequence point in that expression.

The result of ++i is evaluated before the assignment (and before the addition operator), but the side effect is not necessarily applied right away. The result of ++i + 1 is always the same as i + 2, so that's the value that gets assigned to i as part of the assignment operator. The result of ++i is always i + 1, so that's what gets assigned to i as part of the increment operator. There is no sequence point to control which value should get assigned first.

Since the code is violating the rule that "between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression," the behavior is undefined. Practically, though, it's likely that either i + 1 or i + 2 will be assigned first, then the other value will be assigned, and finally the program will continue running as usual — no nasal demons or exploding toilets, and no i + 3, either.

Why is `i = i++ + 1` undefined behavior in C++11?

It's "undefined behavior," not "unspecified." Undefined means the machine is allowed to do anything including output an empty program, terminate randomly, or explode. Of course, a subtly unexpected value when porting to another platform is a more likely outcome.

Undefined behavior applies to any case where two side effects apply to the same scalar without being sequenced relative to each other. In this case, the side effects happen to be identical (both increment i from its original value before the expression), but by the letter of the standard, they combine to produce UB.

The side effects are unsequenced because aside from ,, ?:, ||, and &&, operators do not define sequencing rules in terms such as C++11 §5.15/2:

If the second expression is evaluated, every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second expression.

The assignment operators do define a special sequencing rule, §5.17/1:

In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.

This does not help i = i ++ + 1 because the side effect of i ++ is not part of any value computation.

Is i=i+1 an undefined behaviour?

There is no undefined behavior in this code. i=i+1; is well-defined behavior, not to be confused with i=i++; which gives undefined behavior.

The only thing that could cause different outputs here would be floating point inaccuracy.

Try value += 4 * (int)nearbyint(pow(10,i)); and see if it makes any difference.

Why is f(i = -1, i = -1) undefined behavior?

Since the operations are unsequenced, there is nothing to say that the instructions performing the assignment cannot be interleaved. It might be optimal to do so, depending on CPU architecture. The referenced page states this:

If A is not sequenced before B and B is not sequenced before A, then
two possibilities exist:

  • evaluations of A and B are unsequenced: they may be performed in any order and may overlap (within a single thread of execution, the
    compiler may interleave the CPU instructions that comprise A and B)

  • evaluations of A and B are indeterminately-sequenced: they may be performed in any order but may not overlap: either A will be complete
    before B, or B will be complete before A. The order may be the
    opposite the next time the same expression is evaluated.

That by itself doesn't seem like it would cause a problem - assuming that the operation being performed is storing the value -1 into a memory location. But there is also nothing to say that the compiler cannot optimize that into a separate set of instructions that has the same effect, but which could fail if the operation was interleaved with another operation on the same memory location.

For example, imagine that it was more efficient to zero the memory, then decrement it, compared with loading the value -1 in. Then this:

f(i=-1, i=-1)

might become:

clear i
clear i
decr i
decr i

Now i is -2.

It is probably a bogus example, but it is possible.

In C++11, does `i += ++i + 1` exhibit undefined behavior?

There is no clear case for Undefined Behavior here

Sure, an argument leading to UB can be given, as I indicated in the question, and which has been repeated in the answers given so far. However this involves a strict reading of 5.17:7 that is both self-contradictory and in contradiction with explicit statements in 5.17:1 about compound assignment. With a weaker reading of 5.17:7 the contradictions disappear, as does the argument for UB. Whence my conclusion is neither that there is UB here, nor that there is clearly defined behaviour, but the the text of the standard is inconsistent, and should be modified to make clear which reading prevails (and I suppose this means a defect report should be written). Of course one might invoke here the fall-back clause in the standard (the note in 1.3.24) that evaluations for which the standard fails to define the behavior [unambiguously and self-consistently] are Undefined Behavior, but that would make any use of compound assignments (including prefix increment/decrement operators) into UB, something that might appeal to certain implementors, but certainly not to programmers.

Instead of arguing for the given problem, let me present a slightly modified example that brings out the inconsistency more clearly. Assume one has defined

int& f (int& a) { return a; }

a function that does nothing and returns its (lvalue) argument. Now modify the example to

n += f(++n) + 1;

Note that while some extra conditions about sequencing of function calls are given in the standard, this would at first glance not seem to effect the example, since there are no side effect at all from the function call (not even locally inside the function), as the incrementation happens in the argument expression for f, whose evaluation is not subject to those extra conditions. Indeed, let us apply the Crucial Argument for Undefined Behavior (CAUB), namely 5.17:7 which says that the behavior of such a compound assignment is equivalent to that of (in this case)

n = n + f(++n) + 1;

except that n is evaluated only once (an exception that makes no difference here). The evaluation of the statement I just wrote clearly has UB (the value computation of the first (prvalue) n in the RHS is unsequenced w.r.t. the side effect of the ++ operation, which involves the same scalar object (1.9:15) and you're dead).

So the evaluation of n += f(++n) + 1 has undefined behavior, right? Wrong! Read in 5.17:1 that

With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. — end note ]

This language is far from as precise as I would like it to be, but I don't think it is a stretch to assume that "indeterminately-sequenced" should mean "with respect to that operation of a compound assignment". The (non normative, I know) note makes it clear that the lvalue-to-rvalue conversion is part of the operation of the compound assignment. Now is the call of f indeterminately-sequenced with respect to the operation of the compound assignment of +=? I'm unsure, because the 'sequenced' relation is defined for individual value computations and side effects, not complete evaluations of operators, which may involve both. In fact the evaluation of a compound assignment operator involves three items: the lvalue-to-rvalue conversion of its left operand, the side effect (the assignment proper), and the value computation of the compound assignment (which is sequenced after the side effect, and returns the original left operand as lvalue). Note that the existence of the lvalue-to-rvalue conversion is never explicitly mentioned in the standard except in the note cited above; in particular, the standard makes no (other) statement at all regarding its sequencing relative to other evaluations. It is pretty clear that in the example the call of f is sequenced before the side effect and value computation of += (since the call occurs in the value computation of the right operand to +=), but it might be indeterminately-sequenced with respect to the lvalue-to-rvalue conversion part. I recall from my question that since the left operand of += is an lvalue (and necessarily so), one cannot construe the lvalue-to-rvalue conversion to have occurred as part of the value computation of the left operand.

However, by the principle of the excluded middle, the call to f must either be indeterminately-sequenced with respect to the operation of the compound assignment of +=, or not indeterminately-sequenced with respect to it; in the latter case it must be sequenced before it because it cannot possibly be sequenced after it (the call of f being sequenced before the side effect of +=, and the relation being anti-symmetric). So first assume it is indeterminately-sequenced with respect to the operation. Then the cited clause says that w.r.t. the call of f the evaluation of += is a single operation, and the note explains that it means the call should not intervene between the lvalue-to-rvalue conversion and the side effect associated with +=; it should either be sequenced before both, or after both. But being sequenced after the side effect is not possible, so it should be before both. This makes (by transitivity) the side effect of ++ sequenced before the lvalue-to-rvalue conversion, exit UB. Next assume the call of f is sequenced before the operation of +=. Then it is in particular sequenced before the lvalue-to-rvalue conversion, and again by transitivity so is the side effect of ++; no UB in this branch either.

Conclusion: 5.17:1 contradicts 5.17:7 if the latter is taken (CAUB) to be normative for questions of UB resulting from unsequenced evaluations by 1.9:15. As I said CAUB is self-contradictory as well (by arguments indicated in the question), but this answer is getting to long, so I'll leave it at this for now.

Three problems, and two proposals for resolving them

Trying to understand what the standard writes about these matters, I distinguish three aspects in which the text is hard to interpret; they all are of a nature that the text is insufficiently clear about what model its statements are referring to. (I cite the texts at the end of the numbered items, since I do not know the markup to resume a numbered item after a quote)

  1. The text of 5.17:7 is of an apparent simplicity that, although the intention is easy to grasp, gives us little hold when applied to difficult situations. It makes a sweeping claim (equivalent behavior, apparently in all aspects) but whose application is thwarted by the exception clause. What if the behavior of E1 = E1 op E2 is undefined? Well then that of E1 op = E2 should be as well. But what if the UB was due to E1 being evaluated twice in E1 = E1 op E2? Then evaluating E1 op = E2 should presumably not be UB, but if so, then defined as what? This is like saying "the youth of the second twin was exactly like that of the first, except that he did not die at childbirth." Frankly, I think this text, which has little evolved since the C version "A compound assignment of the the form E1 op = E2 differs from the simple assignment expression E1 = E1 op E2 only in that the lvalue E1 is evaluated only once." might be adapted to match the changes in the standard.

    (5.17) 7 The behavior of an expression of the form E1 op = E2 is equivalent to
    E1 = E1 op E2 except that E1 is evaluated only once.[...]

  2. It is not so clear what precisely the actions (evaluations) are between which the 'sequenced' relation is defined. It is said (1.9:12) that evaluation of an expression includes value computations and initiation of side effects. Though this appears to say that an evaluation may have multiple (atomic) components, the sequenced relation is actually mostly defined (e.g. in 1.9:14,15) for individual components, so that it might be better to read this as that the notion of "evaluation" encompasses both value computations and (initiation of) side effects. However in some cases the 'sequenced' relation is defined for the (entire) execution of an expression of statement (1.9:15) or for a function call (5.17:1), even though a passage in 1.9:15 avoids the latter by referring directly to executions in the body of a called function.

    (1.9) 12 Evaluation of an expression (or a sub-expression) in general includes
    both value computations (...) and initiation of side effects. [...] 13 Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread [...] 14 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. [...] 15 When calling a function (whether or not the function is inline), every value computation and side effect
    associated with any argument expression, or with the postfix expression designating the called function, is
    sequenced before execution of every expression or statement in the body of the called function. [...] Every evaluation in the calling function (including other function calls) ... is indeterminately sequenced with
    respect to the execution of the called function [...] (5.2.6, 5.17) 1 ... With respect to an indeterminately-sequenced function call, ...

  3. The text should more clearly acknowledge that a compound assignment involves, in contrast to a simple assignment, the action of fetching the value previously assigned to its left operand; this action is like lvalue-to-rvalue conversion, but does not happen as part of the value computation of that left operand, since it is not a prvalue; indeed it is a problem that 1.9:12 only acknowledges such action for prvalue evaluation. In particular the text should be more clear about which 'sequenced' relations are given for that action, if any.

    (1.9) 12 Evaluation of an expression... includes... value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation)

The second point is the least directly related to our concrete question, and I think it can be solved simply by choosing a clear point of view and reformulating pasages that seem to indicate a different point of view. Given that one of the main purposes of the old sequence points, and now the 'sequenced' relation, was to make clear that the side effect of postfix-increment operators is unsequenced w.r.t. to actions sequenced after the value computation of that operator (thus giving e.g. i = i++ UB), the point of view must be that individual value computations and (initiation of) individual side effects are "evaluations" for which "sequenced before" may be defined. For pragmatic reasons I would also include two more kinds of (trivial) "evaluations": function entry (so that the language of 1.9:15 may be simplified to: "When calling a function..., every value computation and side effect associated with any of its argument expressions, or with the postfix expression designating the called function, is sequenced before entry of that function") and function exit (so that any action in the function body gets by transitivity sequenced before anything that requires the function value; this used to be guaranteed by a sequence point, but the C++11 standard seems to have lost such guarantee; this might make calling a function ending with return i++; potentially UB where this is not intended, and used to be safe). Then one can also be clear about the "indeterminately sequenced" relation of functions calls: for every function call, and every evaluation that is not (directly or indirectly) part of evaluating that call, that evaluation shall be sequenced (either before or after) w.r.t. both entry and exit of that function call, and it shall have the same relation in both cases (so that in particular such external actions cannot be sequenced after function entry but before function exit, as is clearly desirable within a single thread).

Now to resolve points 1. and 3., I can see two paths (each affecting both points), which have different consequences for the defined or not behavior of our example:

Compound assignments with two operands, and three evaluations

Compound operations have thier two usual operands, an lvalue left operand and a prvalue right operand. To settle the unclarity of 3., it is included in 1.9:12 that fetching the value previously assigned to an object also may occur in compound assignments (rather than only for prvalue evaluation). The semantics of compount assignments are defined by changing 5.17:7 to

In a compound assignment op=, the value previously assigned to the object referred to by the left operand is fetched, the operator op is applied with this value as left operand and the right operand of op= as right operand, and the resulting value replaces that of the object referred to by the left operand.

(That gives two evaluations, the fetch and the side effect; a third evaluation is the trivial value computation of the compound operator, sequenced after both other evaluations.)

For clarity, state clearly in 1.9:15 that value computations in operands are sequenced before all value computations associated with the operator (rather than just those for the result of the operator), which ensures that evaluating the lvalue left operand is sequenced before fetching its value (one can hardly imagine otherwise), and also sequences the value computation of the right operand before that fetch, thus excluding UB in our example. While at it, I see no reason not to also sequence value computations in operands before any side effects associated with the operator (as they clearly must); this would make mentioning this explicitly for (compound) assignments in 5.17:1 superfluous. On the other hand do mention there that the value fetching in a compound assignment is sequenced before its side effect.

Compound assignments with three operands, and two evaluations

In order to obtain that the fetch in a compount assignment will be unsequenced with respect to the value computation of the right operand, making our example UB, the clearest way seems to be to give compound operators an implicit third (middle) operand, a prvalue, not represented by a separate expression, but obtained by lvalue-to-rvalue conversion from the left operand (this three-operand nature corresponds to the expanded form of compound assignments, but by obtaining the middle operand from the left operand, it is ensured that the value is fetched from the same object to which the result will be stored, a crucial guarantee that is only vaguely and implicitly given in the current formulation through the "except that E1 is evaluated only once" clause). The difference with the previous solution is that the fetch is now a genuine lvalue-to-rvalue conversion (since the middle operand is a prvalue) and is performed as part of the value computation of the operands to the compound assignment, which makes it naturally unsequenced with the value computation of the right operand. It should be stated somewhere (in a new clause that describes this implicit operand) that the value computation of the left operand is sequenced before this lvalue-to-rvalue conversion (it clearly must). Now 1.9:12 can be left as it is, and in place of 5.17:7 I propose

In a compound assignment op= with left operand a (an lvalue), and midlle and right operands brespectively c (both prvalues), the operator op is applied with b as left operand and c as right operand, and the resulting value replaces that of the object referred to by a.

(That gives one evaluation, the side effect, with as second evaluation the trivial value computation of the compound operator, sequenced after it.)

The still applicable changes to 1.9:15 and 5.17:1 suggested in the previous solution could still apply, but would not give our original example defined behavior. However the modified example at the top of this answer would still have defined behavior, unless the part 5.17:1 "compound assignment is a single operation" is scrapped or modified (there is a similar passage in 5.2.6 for postfix increment/decrement). The existence of those passages would suggest that detaching the fecth and store operations within a single compound assignement or postfix increment/decrement was not the intention of those who wrote the current standard (and by extension making our example UB), but this of course is mere guesswork.

Undefined, unspecified and implementation-defined behavior

Undefined behavior is one of those aspects of the C and C++ language that can be surprising to programmers coming from other languages (other languages try to hide it better). Basically, it is possible to write C++ programs that do not behave in a predictable way, even though many C++ compilers will not report any errors in the program!

Let's look at a classic example:

#include <iostream>

int main()
{
char* p = "hello!\n"; // yes I know, deprecated conversion
p[0] = 'y';
p[5] = 'w';
std::cout << p;
}

The variable p points to the string literal "hello!\n", and the two assignments below try to modify that string literal. What does this program do? According to section 2.14.5 paragraph 11 of the C++ standard, it invokes undefined behavior:

The effect of attempting to modify a string literal is undefined.

I can hear people screaming "But wait, I can compile this no problem and get the output yellow" or "What do you mean undefined, string literals are stored in read-only memory, so the first assignment attempt results in a core dump". This is exactly the problem with undefined behavior. Basically, the standard allows anything to happen once you invoke undefined behavior (even nasal demons). If there is a "correct" behavior according to your mental model of the language, that model is simply wrong; The C++ standard has the only vote, period.

Other examples of undefined behavior include accessing an array beyond its bounds, dereferencing the null pointer, accessing objects after their lifetime ended or writing allegedly clever expressions like i++ + ++i.

Section 1.9 of the C++ standard also mentions undefined behavior's two less dangerous brothers, unspecified behavior and implementation-defined behavior:

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine.

Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.

Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.

Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior.end note ]

Specifically, section 1.3.24 states:

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

What can you do to avoid running into undefined behavior? Basically, you have to read good C++ books by authors who know what they're talking about. Avoid internet tutorials. Avoid bullschildt.

Why is a = i + i++ undefined and not unspecified behaviour

From the viewpoint of C++, I think the answer is incredibly simple: it was made undefined behavior because C had made it undefined behavior long before, and there was essentially no potential gain from changing that.

That points to what I'd guess was really more the intended question: why did C make this undefined behavior?

I don't think that has quite as simple of an answer. One possibility is simple caution -- knowledge that by the time the C standard was being written, C had already been implemented, deployed and used on lots of machines. A fair number of machines back then seemed like a lot of code I still see: something originally designed only as a personal experiment, that worked well enough that it ended up designated as "production", without even a token attempt at fixing anything by the most egregious problems. As such, even if nobody knew of hardware this would break, nobody could be really sure such hardware didn't exist either, so it was safest to just call it UB, and be done with it.

Another possibility is that it went a bit beyond simple caution. Even though we can feel fairly safe with modern hardware, there may have been hardware at the time that people really knew would have major problems with this, and (especially if vendors associated with that hardware were represented on the committee) allowing C to run on that hardware was considered important.

Yet another possibility would be that even though nobody knew of (or even feared the possibility of) some existing implementation that this could break, they foresaw the future possibility of something it would break, so undefined behavior was seen as a way of future proofing the language to at least some limited degree.

A final possibility is that whoever was writing that part of the standard moved on to other things as soon as they came up with a set of rules that seemed acceptable, even though they could have come up with other rules that at least some might have liked better.

If I had to guess, I'd say it was probably a combination of the third and fourth possibilities I've given -- the committee was aware of developments in parallel computing without knowing how it would work out in the end, so for whomever wrote this, maximizing latitude on the part of the implementation seemed like the easiest/simplest route to gaining consensus so they could finish it and move on to bigger and better things.

Undefined behavior in c/c++: i++ + ++i vs ++i + i++

int j = ++i + i++;

is still undefined behavior since ++i and i++ can be processed simultaneously in multiple pipelines in some CPUs, which will lead to unpredictable results.

What are the common undefined/unspecified behavior for C that you run into?

A language lawyer question. Hmkay.

My personal top3:

  1. violating the strict aliasing rule

  2. violating the strict aliasing rule

  3. violating the strict aliasing rule

    :-)

Edit Here is a little example that does it wrong twice:

(assume 32 bit ints and little endian)

float funky_float_abs (float a)
{
unsigned int temp = *(unsigned int *)&a;
temp &= 0x7fffffff;
return *(float *)&temp;
}

That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.

However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).

In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.

There are three valid ways to do the same.

Use a char or void pointer during the cast. These always alias to anything, so they are safe.

float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}

Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.

float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}

The third valid way: use unions. This is explicitly not undefined since C99:



Related Topics



Leave a reply



Submit