Is the Behaviour of I = I++ Really Undefined

Is the behaviour of i = i++ really undefined?

The phrase, "…the final value of i will be 4 no matter what the order of evaluation…" is incorrect. The compiler could emit the equivalent of this:

i = 3;
int tmp = i;
++i;
i = tmp;

or this:

i = 3;
++i;
i = i - 1;

or this:

i = 3;
i = i;
++i;

As to the definitions of terms, if the answer was guaranteed to be 4, that wouldn't be unspecified or undefined behavior, it would be defined behavior.

As it stands, it is undefined behaviour according to the standard (Wikipedia), so it's even free to do this:

i = 3;
system("sudo rm -rf /"); // DO NOT TRY THIS AT HOME … OR AT WORK … OR ANYWHERE.

Does an expression with undefined behaviour that is never actually executed make a program erroneous?


If a side effect on a scalar object is unsequenced relative to etc

Side effects are changes in the state of the execution environment (1.9/12). A change is a change, not an expression that, if evaluated, would potentially produce a change. If there is no change, there is no side effect. If there is no side effect, then no side effect is unsequenced relative to anything else.

This does not mean that any code which is never executed is UB-free (though I'm pretty sure most of it is). Each occurrence of UB in the standard needs to be examined separately. (The stricken-out text is probably overly cautious; see below).

The standard also says that

A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).

(emphasis mine)

This, as far as I can tell, is the only normative reference that says what the phrase "undefined behavior" means: an undefined operation in a program execution. No execution, no UB.

Why is the phrase: undefined behavior means the compiler can do anything it wants true?

Nothing "causes" this to occur. Undefined behaviour cannot "occur". There is no mystical force that descends upon your computer and suddenly makes it create black holes inside of cats.

That anything can happen when you run a program whose behaviour is undefined, is stated as fact by the C++ standard. It's a statement of leeway, a handy excuse used by compilers to make assumptions about your code so as to provide useful optimisations.

For example, if we say that dereferencing nullptr is undefined (which it is) then no compiler needs to ever check that a pointer is not nullptr: it can just assume that a dereferenced pointer will never be nullptr, and if it's not then any consequences are the programmer's problem.

Due to the astounding complexity of compilers, some of those consequences can be rather unexpected.

Of course it is not actually true that "anything can happen". Your computer has neither the necessary physical power nor the necessary legal authority to instantiate a black hole inside of a cat. But since C++ is an abstraction, it seems only fitting that we use abstractions to teach people not to write programs with undefined behaviour. If you program rigorously, assuming that "anything can happen" if your program has undefined behaviour, then you will not be surprised by said rather unexpected consequences, and you will not be tempted to try to "control" the outcome in any way.

Does Undefined Behavior really permit *anything* to happen?

Yes, it permits anything to happen. The note is just giving examples. The definition is pretty clear:

Undefined behavior: behavior for which this International Standard imposes no requirements.


Frequent point of confusion:

You should understand that "no requirement" also means means the implementation is NOT required to leave the behavior undefined or do something bizarre/nondeterministic!

The implementation is perfectly allowed by the C++ standard to document some sane behavior and behave accordingly.1 So, if your compiler claims to wrap around on signed overflow, logic (sanity?) would dictate that you're welcome to rely on that behavior on that compiler. Just don't expect another compiler to behave the same way if it doesn't claim to.

1Heck, it's even allowed to document one thing and do another. That'd be stupid, and it'd probably make you toss it into the trash—why would you trust a compiler whose documentation lies to you?—but it's not against the C++ standard.

Is the behaviour of a program that has undefined behaviour on an unreachable path defined?

It is not necessary to base a position on this question on the usefulness of any given code construct or practice, nor on anything written about C++, whether in its standard or in another SO answer, no matter how similar C++'s definitions may be. The key thing to consider is C's definition of undefined behavior:

behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements

(C2011, 3.4.3/1; emphasis added)

Thus, undefined behavior is triggered temporally ("upon use" of a construct or data), not by mere presence.* It is convenient that this is consistent for undefined behavior arising from data and that arising from program constructs; the standard need not have been consistent there. And as another answer describes, this "upon use" definition is a good design choice, as it allows programs to avoid executing undefined behaviors associated with erroneous data.

On the other hand, if a program does execute undefined behavior then it follows from the standard's definition that the whole behavior of the program is undefined. This consequent undefinedness is a more general kind arising from the fact that the UB associated directly with the erroneous data or construct could, in principle, include altering the behavior of other parts of the program, even retroactively (or apparently so). There are of course extra-lingual limitations on what could happen -- so no, nasal demons will not actually be making any appearances -- but those are not necessarily as strong as one might suppose.


* Caveat: some program constructs are used at translation time. These produce UB in program translation, with the result that every execution of the program has wholly-undefined behavior. For a somewhat stupid example, if your program source does not end with an unescaped newline then the program's behavior is completely undefined (see C2011, 5.1.1.2/1, point 2).

x = ++x is it really undefined?

Conditional operator ?: has a sequence point between evaluation of the condition (first operand) and evaluation of second or third operand, but it has no dedicated sequence point after the evaluation of second or third operand. Which means that two modifications of x in this example are potentially conflicting (not separated by a sequence point). So, Coverity Prevent is right.

Your statement in that regard is virtually equivalent to

a >= b ? x = ++x : x = 0;

with the same problem as in x = ++x.

Now, the title of your question seems to suggest that you don't know whether x = ++x is undefined. It is indeed undefined. It is undefined for the very same reason x = x++ is undefined. In short, if the same object is modified more than once between a pair of adjacent sequence points, the behavior is undefined. In this case x is modified by assignment and by ++ an there's no sequence point to "isolate" these modifications from each other. So, the behavior is undefined. There's absolutely no difference between ++x and x++ in this regard.

What are the common undefined/unspecified behavior for C that you run into?

A language lawyer question. Hmkay.

My personal top3:

  1. violating the strict aliasing rule

  2. violating the strict aliasing rule

  3. violating the strict aliasing rule

    :-)

Edit Here is a little example that does it wrong twice:

(assume 32 bit ints and little endian)

float funky_float_abs (float a)
{
unsigned int temp = *(unsigned int *)&a;
temp &= 0x7fffffff;
return *(float *)&temp;
}

That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.

However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).

In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.

There are three valid ways to do the same.

Use a char or void pointer during the cast. These always alias to anything, so they are safe.

float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}

Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.

float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}

The third valid way: use unions. This is explicitly not undefined since C99:

float funky_float_abs (float a)
{
union
{
unsigned int i;
float f;
} cast_helper;

cast_helper.f = a;
cast_helper.i &= 0x7fffffff;
return cast_helper.f;
}

Why is a = i + i++ undefined and not unspecified behaviour

From the viewpoint of C++, I think the answer is incredibly simple: it was made undefined behavior because C had made it undefined behavior long before, and there was essentially no potential gain from changing that.

That points to what I'd guess was really more the intended question: why did C make this undefined behavior?

I don't think that has quite as simple of an answer. One possibility is simple caution -- knowledge that by the time the C standard was being written, C had already been implemented, deployed and used on lots of machines. A fair number of machines back then seemed like a lot of code I still see: something originally designed only as a personal experiment, that worked well enough that it ended up designated as "production", without even a token attempt at fixing anything by the most egregious problems. As such, even if nobody knew of hardware this would break, nobody could be really sure such hardware didn't exist either, so it was safest to just call it UB, and be done with it.

Another possibility is that it went a bit beyond simple caution. Even though we can feel fairly safe with modern hardware, there may have been hardware at the time that people really knew would have major problems with this, and (especially if vendors associated with that hardware were represented on the committee) allowing C to run on that hardware was considered important.

Yet another possibility would be that even though nobody knew of (or even feared the possibility of) some existing implementation that this could break, they foresaw the future possibility of something it would break, so undefined behavior was seen as a way of future proofing the language to at least some limited degree.

A final possibility is that whoever was writing that part of the standard moved on to other things as soon as they came up with a set of rules that seemed acceptable, even though they could have come up with other rules that at least some might have liked better.

If I had to guess, I'd say it was probably a combination of the third and fourth possibilities I've given -- the committee was aware of developments in parallel computing without knowing how it would work out in the end, so for whomever wrote this, maximizing latitude on the part of the implementation seemed like the easiest/simplest route to gaining consensus so they could finish it and move on to bigger and better things.

Is it undefined behaviour to access two objects of type T declared next to each other using T[]?

I want to start with a quote from the presentation:

You are not programming against the CPU, you are programming against the abstract machine

You say:

He states that accessing doubles next to each other was UB

But your quote is incomplete. He specifies this very crucial fact:

... unless the objects are part of an array

malloc is a red herring (and a bag of another set of problems). His code uses new[] so malloc is just poisoning the well here.

The specific problem he mentions on his slides is that the double objects created on the buffer are created by std::uninitialized_default_construct_n and this method doesn't create an array of doubles, but instead creates multiple objects that are consecutive in memory. He asserts that in the C++ standard (the abstract machine you are programming against) you can't treat objects as part of an array unless you actually created the array of objects.

The point the author tries to make is that the C++ standard is flawed and there is no strictly conforming way to create a flexible array (pre C++20).


For reference here is the code (reproduced after image):

struct header
{
int size;
byte* buffer;
thing some;
};
constexpr size_t x = ...;

byte* buffer = new byte[x + n * sizeof(double)];
header* p = new (buffer) header{n, buffer};
uninitialized_default_construct_n(
reinterpret_cast<double*>(buffer + x), n);
double* data = reinterpret_cast<double*>(p->buffer + x);

data[0] = data[1] + data[2]; // <-- problem here
// because we never created an array of doubles


Related Topics



Leave a reply



Submit