Does this code from The C++ Programming Language 4th edition section 36.3.6 have well-defined behavior?
The code exhibits unspecified behavior due to unspecified order of evaluation of sub-expressions although it does not invoke undefined behavior since all side effects are done within functions which introduces a sequencing relationship between the side effects in this case.
This example is mentioned in the proposal N4228: Refining Expression Evaluation Order for Idiomatic C++ which says the following about the code in the question:
[...]This code has been reviewed by C++ experts world-wide, and published
(The C++ Programming Language, 4th edition.) Yet, its vulnerability
to unspecified order of evaluation has been discovered only recently
by a tool[...]
Details
It may be obvious to many that arguments to functions have an unspecified order of evaluation but it is probably not as obvious how this behavior interacts with chained functions calls. It was not obvious to me when I first analyzed this case and apparently not to all the expert reviewers either.
At first glance it may appear that since each replace
has to be evaluated from left to right that the corresponding function argument groups must be evaluated as groups from left to right as well.
This is incorrect, function arguments have an unspecified order of evaluation, although chaining function calls does introduce a left to right evaluation order for each function call, the arguments of each function call are only sequenced before with respect to the member function call they are part of. In particular this impacts the following calls:
s.find( "even" )
and:
s.find( " don't" )
which are indeterminately sequenced with respect to:
s.replace(0, 4, "" )
the two find
calls could be evaluated before or after the replace
, which matters since it has a side effect on s
in a way that would alter the result of find
, it changes the length of s
. So depending on when that replace
is evaluated relative to the two find
calls the result will differ.
If we look at the chaining expression and examine the evaluation order of some of the sub-expressions:
s.replace(0, 4, "" ).replace( s.find( "even" ), 4, "only" )
^ ^ ^ ^ ^ ^ ^ ^ ^
A B | | | C | | |
1 2 3 4 5 6
and:
.replace( s.find( " don't" ), 6, "" );
^ ^ ^ ^
D | | |
7 8 9
Note, we are ignoring the fact that 4
and 7
can be further broken down into more sub-expressions. So:
A
is sequenced beforeB
which is sequenced beforeC
which is sequenced beforeD
1
to9
are indeterminately sequenced with respect to other sub-expressions with some of the exceptions listed below1
to3
are sequenced beforeB
4
to6
are sequenced beforeC
7
to9
are sequenced beforeD
The key to this issue is that:
4
to9
are indeterminately sequenced with respect toB
The potential order of evaluation choice for 4
and 7
with respect to B
explains the difference in results between clang
and gcc
when evaluating f2()
. In my tests clang
evaluates B
before evaluating 4
and 7
while gcc
evaluates it after. We can use the following test program to demonstrate what is happening in each case:
#include <iostream>
#include <string>
std::string::size_type my_find( std::string s, const char *cs )
{
std::string::size_type pos = s.find( cs ) ;
std::cout << "position " << cs << " found in complete expression: "
<< pos << std::endl ;
return pos ;
}
int main()
{
std::string s = "but I have heard it works even if you don't believe in it" ;
std::string copy_s = s ;
std::cout << "position of even before s.replace(0, 4, \"\" ): "
<< s.find( "even" ) << std::endl ;
std::cout << "position of don't before s.replace(0, 4, \"\" ): "
<< s.find( " don't" ) << std::endl << std::endl;
copy_s.replace(0, 4, "" ) ;
std::cout << "position of even after s.replace(0, 4, \"\" ): "
<< copy_s.find( "even" ) << std::endl ;
std::cout << "position of don't after s.replace(0, 4, \"\" ): "
<< copy_s.find( " don't" ) << std::endl << std::endl;
s.replace(0, 4, "" ).replace( my_find( s, "even" ) , 4, "only" )
.replace( my_find( s, " don't" ), 6, "" );
std::cout << "Result: " << s << std::endl ;
}
Result for gcc
(see it live)
position of even before s.replace(0, 4, "" ): 26
position of don't before s.replace(0, 4, "" ): 37
position of even after s.replace(0, 4, "" ): 22
position of don't after s.replace(0, 4, "" ): 33
position don't found in complete expression: 37
position even found in complete expression: 26
Result: I have heard it works evenonlyyou donieve in it
Result for clang
(see it live):
position of even before s.replace(0, 4, "" ): 26
position of don't before s.replace(0, 4, "" ): 37
position of even after s.replace(0, 4, "" ): 22
position of don't after s.replace(0, 4, "" ): 33
position even found in complete expression: 22
position don't found in complete expression: 33
Result: I have heard it works only if you believe in it
Result for Visual Studio
(see it live):
position of even before s.replace(0, 4, "" ): 26
position of don't before s.replace(0, 4, "" ): 37
position of even after s.replace(0, 4, "" ): 22
position of don't after s.replace(0, 4, "" ): 33
position don't found in complete expression: 37
position even found in complete expression: 26
Result: I have heard it works evenonlyyou donieve in it
Details from the standard
We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9
Program execution which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.[...]
and we know that a function call introduces a sequenced before relationship of the function calls postfix expression and arguments with respect to the function body, from section 1.9
:
[...]When calling a function (whether or not the function is inline), every
value computation and side effect associated with any argument
expression, or with the postfix expression designating the called
function, is sequenced before execution of every expression or
statement in the body of the called function.[...]
We also know that class member access and therefore chaining will evaluate from left to right, from section 5.2.5
Class member access which says:
[...]The postfix expression before the dot or arrow is evaluated;64
the result of that evaluation, together with the id-expression,
determines the result of the entire postfix expression.
Note, in the case where the id-expression ends up being a non-static member function it does not specify the order of evaluation of the expression-list within the ()
since that is a separate sub-expression. The relevant grammar from 5.2
Postfix expressions:
postfix-expression:
postfix-expression ( expression-listopt) // function call
postfix-expression . templateopt id-expression // Class member access, ends
// up as a postfix-expression
C++17 changes
The proposal p0145r3: Refining Expression Evaluation Order for Idiomatic C++ made several changes. Including changes that give the code well specified behavior by strengthening the order of evaluation rules for postfix-expressions and their expression-list.
[expr.call]p5 says:
The postfix-expression is sequenced before each expression in the expression-list and any default argument. The
initialization of a parameter, including every associated value computation and side effect, is indeterminately
sequenced with respect to that of any other parameter. [ Note: All side effects of argument evaluations are
sequenced before the function is entered (see 4.6). —end note ] [ Example:void f() {
std::string s = "but I have heard it works even if you don’t believe in it";
s.replace(0, 4, "").replace(s.find("even"), 4, "only").replace(s.find(" don’t"), 6, "");
assert(s == "I have heard it works only if you believe in it"); // OK
}
—end example ]
Is this code from The C++ Programming Language 4th Edition Section 19.3.3.1 valid?
How can
String::copy_from()
usememcpy()
to copy the class?
int
, char
, and the anonymous union are all trivially copyable. So while you cannot perform a memcpy
of a String
, you can perform a memcpy
of its members. All of them, all at once. The technically correct code for this would be:
memcpy(&this->sz, &x.sz, sizeof(x));
That copies the range of memory of the storage for the members of this object.That is guaranteed by the rules of standard layout types. For standard layout types, the members are stored in definition order. So if you start with the first, and cover the range of all objects, that should copy the members.
However, the standard also says that the first member subobject of a standard layout type is required to have the same address as the object itself:
If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member.
That means &this->sz
must be the same address as this
, and &x.sz
must be the same address as &x
.
So just cut out the middle-man:
memcpy(this, &x, sizeof(x));
This is only allowed because of the rules of standard layout types.
A bigger issue is that copy_from never checks for self assignment. memcpy
doesn't work with overlapping memory ranges. Maybe operator=
and similar functions check for this already.
What should the result be when assigning a variable to a reference to itself, in-between modified and then returned by a function call?
Since C++17 the order of evaluation is specified such that the operands of =
are evaluated right-to-left and those of <<
are evaluated left-to-right, matching the associativity of these operators. (But this doesn't apply to all operators, e.g. +
and other arithmetic operators.)
So in
addOne(x) = x;
first the value of the right-hand side is evaluated, yielding 5
. Then the function addOne
is called and it doesn't matter what it does with x
since it returns a reference to it, to which the right-hand value 5
is assigned.
Formally, evaluating the right-hand side first means that we replace the lvalue x
by the (pr)value it holds (lvalue-to-rvalue conversion). Then we call addOne(x)
to modify the object that the lvalue x
refers to.
So, imagining temporary variables to hold the results of the individual evaluations, the line is equivalent to (except for extra copies introduced by the new variables, which don't matter in the case of int
):
int t = x;
int& y = addOne(x);
y = t; // same as x = t, because y will refer to x
Then in the line
std::cout << x << ' ' << addOne(x);
we first evaluate and output x
, resulting in 5
, and then call addOne
, resulting in 6
.
So the line is equivalent to (simplified, knowing that operator<<
will return std::cout
again):
int t1 = x;
std::cout << t1 << ' ';
int t2 = addOne(x);
std::cout << t2;
The output 5 6
is the only correct one since C++17.
Before C++17, the evaluation order of the two sides of the assignment operator was unsequenced.
Having a scalar modification unsequenced with a value computation on the same scalar (on the right-hand side of your assignment) causes undefined behavior normally.
But since you put the increment of x
into a function, an additional rule saying that the execution of a function body is merely indeterminately sequenced with other evaluations in the calling context saves this. It means that the line wont have undefined behavior anymore, but the order in which the evaluations of the two sides of the assignment happen could be either left-first or right-first.
This means we won't know whether x
is evaluated first and then addOne(x)
or the other way around.
Therefore after the line, x
may be 5
or 6
.
6
would be obtained if the evaluation happened equivalently to
int& y = addOne(x);
int t = x;
y = t;
Then in the line
std::cout << x << ' ' << addOne(x);
pre-C++17 the same issue applied. The evaluations of the arguments to <<
were indeterminately sequenced, rather than left-to-right and so addOne(x)
could be evaluated before the left-hand x
, i.e. in addition to the previous order, the evaluation could also be equivalent to
int t2 = addOne(x);
int t1 = x;
std::cout << t1 << ' ' << t2;
In this case x
is first incremented and then its new value is printed twice.
Therefore possible program output could be either of the following:
5 6
6 6
6 7
7 7
(Technically the int t2 = addOne(x)
are two evaluations: One call to addOne
returning a reference and then the lvalue-to-rvalue conversion. These could happen interleaved with the other evaluations, but this doesn't give any new program outputs.)
You can specify to use C++17 (or newer versions like C++20) with the -std=c++17
flag to the compiler if you are using GCC or Clang and /std:c++17
if you are using MSVC. Which standard version is chosen by-default depends on the compiler and compiler version.
Why does x = x * y / z give a different result from x *= y / z for integers?
Because *
and /
have the same precedence with left associativity, the expression is not
x * ((*n + i) / i)
(which is the same as x *= (*n + i) / i
) but
(x * (*n + i)) / i
Assignment in C++ occurs despite exception on the right side
Before C++17 there was no sequencing between the left- and right-hand side of assignment operators.
It's first in C++17 that explicit sequencing was introduced (right-hand side is evaluated first).
That means the evaluation order is unspecified, which means it's up to the implementation to perform the evaluation in the order in which it wants, and in this case it evaluates the left-hand side first.
See this evaluation order reference for more details (especially point 20).
Order of evaluation in chain invocation in C++
The gist of it is that in a function call, X(Y, Z)
; evaluation of all of X
, Y
, Z
are indeterminately sequenced with respect to each other. The only sequencing is that Y
and Z
are sequenced-before the call to the function which X
evaluated to.
Suppose we have:
typedef void (*fptr)(int, double);
fptr a();
int b();
double c();
a()(b(), c());
The three functions a
, b
, c
may be called in any order. Of course this all applies recursively to any sub-expressions.
Calling an argument-permutation-invariant function f(i++, i++)
You are correct that in C++17
foo(*it++, *it++, *it++);
is not undefined behavior. As your quote states, and as stated in [expr.call]/8
The postfix-expression is sequenced before each expression in the expression-list and any default argument. The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter.
each increment is sequenced so you do not have multiple unsequenced writes. The order of evaluation of function arguments is still unspecified in C++17 so that means you just have unspecified behavior (you can't know which element is passed to each argument).
As long as your function doesn't care about this then you are fine. If the order of the arguments will matter then you'll have to use your second version.
That all said I would prefer the use of
foo(*it, *(it + 1), *(it + 2));
even if the order doesn't matter. It makes the code backwards compatible and IMHO it is easier to reason about. I'd rather see a it += 3
in the increment part of a for loop then the multiple increments in the function call and no increment in the increment part of the loop.
Related Topics
How to Enable _Int128 on Visual Studio
Difference Between Try-Catch Syntax for Function
Floating Point Equality and Tolerances
Reading from Ifstream Won't Read Whitespace
How to Have an Unordered_Map Where the Value Type Is the Class It's In
"Undefined Reference To" Using 'G++' to Compile a C++ Program
How to Read a Binary File into a Vector of Unsigned Chars
Making Std::Vector Allocate Aligned Memory
Effective C++ Item 23 Prefer Non-Member Non-Friend Functions to Member Functions
Borderless Window Using Areo Snap, Shadow, Minimize Animation, and Shake
Why Can't We Pass Arrays to Function by Value
How to Input Variables Using Cin Without Creating a New Line
Selecting a Member Function Using Different Enable_If Conditions
Iterate Through Struct and Class Members
Get 3D Coordinates from 2D Image Pixel If Extrinsic and Intrinsic Parameters Are Known