Any Good Reason Why Assignment Operator Isn't a Sequence Point

Any good reason why assignment operator isn't a sequence point?

By request:

In general, things need a reason to be a sequence point. They don't need a reason not to be a sequence point; that's the default.

For example, && must be a sequence point because of short-circuiting behaviour: if the left-hand side is false, the right-hand side must not be evaluated. (This is not just about optimization; the right-hand side could have side effects, and/or depend on the left-hand side being true, as in ptr && ptr->data.) Therefore the left-hand side must be evaluated first, before the right-hand side, in order to see if the right-hand side should be evaluated at all.

This reason does not exist for = because, although there is "evaluation" to do for both sides (although there are different restrictions on what can appear on both sides: the left-hand side must be an lvalue - the l doesn't stand for "left", btw; it stands for "location", as in location in memory - we can't assign to a temporary or a literal), it doesn't matter which side is evaluated first - as long as both sides are evaluated before the actual assignment.

Sequence points and order of evaluation

Does standard says anything about the order of evaluation of sub-expressions in case of the sequence points?

The order of evaluation is well defined in case of conditional operators && as well as || and that is the very reason short circuiting works.

It is explicitly specified by the c99 standard.

Reference: c99 Standard

Annex J: J.1 Unspecified behavior

1 The following are unspecified:

.....

The order in which subexpressions are evaluated and the order in which side effects
take place, except as specified for the function-call (), &&, ||, ?:, and comma
operators (6.5).

.....

Further in,

6.5.14 Logical OR operator

4) Unlike the bitwise | operator, the || operator guarantees left-to-right evaluation; there is a sequence point after the evaluation of the first operand. If the first operand compares unequal to 0, the second operand is not evaluated.

As well as for logical AND:

6.5.13 Logical AND operator

Unlike the bitwise binary & operator, the && operator guarantees left-to-right evaluation;
if the second operand is evaluated, there is a sequence point between the evaluations of
the first and second operands. If the first operand compares equal to 0, the second
operand is not evaluated.

Assignment and sequence points: how is this ambiguous?

The rules of undefined behavior for sequence point violations do not make an exception for situations when "the value cannot change". Nobody cares whether the value changes or not. What matters is that when you are making any sort of write access to the variable, you are modifying that variable. Even if you are assigning the variable a value that it already holds, you are still performing a modification of that variable. And if multiple modifications are not separated by sequence points, the behavior is undefined.

One can probably argue that such "non-modifying modifications" should not cause any problems. But the language specification does not concern itself with such details. In language terminology, again, every time you are writing something into a variable, you are modifying it.

Moreover, the fact that you use the word "ambiguous" in your question seems to imply that you believe the behavior is unspecified. I.e. as in "the resultant value of the variable is (or isn't) ambiguous". However, in sequence point violations the language specification does not restrict itself to stating that the result is unspecified. It goes much further and declares the behavior undefined. This means that the rationale behind these rules takes into consideration more than just an unpredictable final value of some variable. For example, on some imaginary hardware platform non-sequenced modification might result in invalid code being generated by the compiler, or something like that.

Assignment and pointers, undefined behavior?

The language does not guarantee you that the right-hand side subexpression func(&ptr) in

*ptr = func(&ptr);

is evaluated first, and the left-hand side subexpression *ptr is evaluated later (which is apparently what you expected to happen). The left-hand side can legally be evaluated first, before call to func. And this is exactly what happened in your case: *ptr got evaluated before the call, when ptr was still pointing to x. After that the assignment destination became finalized (i.e. it became known that the code will assign to x). Once it happens, changing ptr no longer changes the assignment destination.

So, the immediate behavior of your code is unspecified due to unspecified order of evaluation. However, one possible evaluation schedule leads to undefined behavior by causing a null pointer dereference. This means that in general case the behavior is undefined.

If I had to model the behavior of this code in terms of C++ language, I'd say that the process of evaluation in this case can be split into these essential steps

1a. int &lhs = *ptr;      // evaluate the left-hand side
1b. int rhs = func(&ptr); // evaluate the right-hand side
2. lhs = rhs; // perform the actual assignment

(Even though C language does not have references, internally it uses the same concept of "run-time bound lvalue" to store the result of evaluation of left-hand side of assignment.) The language specification allows enough freedom to make steps 1a and 1b to occur in any order. You expected 1b to occur first, while your compiler decided to start with 1a.

is i=f(); defined when f modifies i?

The transfer of control from f back to
the calling expression is not listed
as a sequence point.

Yes it is.

at the end of the evaluation of a full expression

 

The complete expression that forms an
expression statement, or one of the
controlling expressions of an if,
switch, while, for, or do/while
statement, or the expression in an
initializer or a return statement.

You have a return statement, therefore, you have a sequence point.

It doesn't even appear that

int f(void) { return i++; } // sequence point here, so I guess we're good
i = f();

is undefined. (Which to me is kind of weird.)

Is there a sequence point between these assignments?

The relevant quote from the (draft) standard [6.5.2.2, 10] is:

The order of evaluation of the function designator,
the actual arguments, and subexpressions within the
actual arguments is unspecified, but there is a sequence
point before the actual call.

So for your expression, the first argument (in particular the call to f) could be
evaluated before the second argument; e.g.:

(x = 1, 1), f <sp> call, (x = 2), f <sp> call

Or, it could be evaluated after the second argument; e.g.:

(x = 2), (x = 1, 1), f <sp> call, f <sp> call

[The function call itself can (and most probably will) contain more sequence points (in particular if it contains a return statement).]

Depending on that, there is a sequence point between the assignments or not.
It is up to the platform ("unspecified").

Since in the 2nd case, you are assigning to x twice between two sequence points, you have undefined behaviour on such a platform.

Sequence Points vs Operator Precedence

The first answer in the question you linked to explains exactly what's going on. I'll try to rephrase it to make it more clear.

Operator precedence defines the order of the computation of values via expressions. The result of the expression (a++) is well understood.

However, the modification of the variable a is not part of the expression. Yes, really. This is the part you're having trouble understanding, but that's simply how C and C++ define it.

Expressions result in values, but some expressions can have side effects. The expression a = 1 has a value of 1, but it also has the side effect of setting the variable a to 1. As far as how C and C++ define things, these are two different steps. Similarly, a++ has a value and a side-effect.

Sequence points define when side effects are visible to expressions that are evaluated after those sequence points. Operator precedence has nothing to do with sequence points. That's just how C/C++ defines things.

About sequence point and UB

Which option is wrong and why?

All of them are wrong, and that's because the first two invoke undefined behavior, and the last two don't compile. (And if they did, they would invoke UB as well.)

Undefined behavior and sequence points

C++98 and C++03

This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.


Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.

Pre-requisites : An elementary knowledge of C++ Standard



What are Sequence Points?

The Standard says

At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)

Side effects? What are side effects?

Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).

For example:

int x = y++; //where y is also an int

In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.

So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:

Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.



What are the common sequence points listed in the C++ Standard?

Those are:

  • at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1

    Example :

    int a = 5; // ; is a sequence point here
  • in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2

    • a && b (§5.14)
    • a || b (§5.15)
    • a ? b : c (§5.16)
    • a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
  • at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
    takes place before execution of any expressions or statements in the function body (§1.9/17).

1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument

2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.



What is Undefined Behaviour?

The Standard defines Undefined Behaviour in Section §1.3.12 as

behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.

Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.

3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.



What is the relation between Undefined Behaviour and Sequence Points?

Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.

You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.

For example:

int x = 5, y = 6;

int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.

Another example here.


Now the Standard in §5/4 says


    1. Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.

What does it mean?

Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.

From the above sentence the following expressions invoke Undefined Behaviour:

i++ * ++i;   // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))

i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)

But the following expressions are fine:

i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined




    1. Furthermore, the prior value shall be accessed only to determine the value to be stored.

What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.

For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.

This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.

Example 1:

std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2

Example 2:

a[i] = i++ // or a[++i] = i or a[i++] = ++i etc

is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.

Example 3 :

int x = i + i++ ;// Similar to above

Follow up answer for C++11 here.

Why is `i = ++i + 1` unspecified behavior?

You make the mistake of thinking of operator= as a two-argument function, where the side effects of the arguments must be completely evaluated before the function begins. If that were the case, then the expression i = ++i + 1 would have multiple sequence points, and ++i would be fully evaluated before the assignment began. That's not the case, though. What's being evaluated in the intrinsic assignment operator, not a user-defined operator. There's only one sequence point in that expression.

The result of ++i is evaluated before the assignment (and before the addition operator), but the side effect is not necessarily applied right away. The result of ++i + 1 is always the same as i + 2, so that's the value that gets assigned to i as part of the assignment operator. The result of ++i is always i + 1, so that's what gets assigned to i as part of the increment operator. There is no sequence point to control which value should get assigned first.

Since the code is violating the rule that "between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression," the behavior is undefined. Practically, though, it's likely that either i + 1 or i + 2 will be assigned first, then the other value will be assigned, and finally the program will continue running as usual — no nasal demons or exploding toilets, and no i + 3, either.



Related Topics



Leave a reply



Submit