Undefined, Unspecified and Implementation-Defined Behavior

Undefined, unspecified and implementation-defined behavior

Undefined behavior is one of those aspects of the C and C++ language that can be surprising to programmers coming from other languages (other languages try to hide it better). Basically, it is possible to write C++ programs that do not behave in a predictable way, even though many C++ compilers will not report any errors in the program!

Let's look at a classic example:

#include <iostream>

int main()
{
char* p = "hello!\n"; // yes I know, deprecated conversion
p[0] = 'y';
p[5] = 'w';
std::cout << p;
}

The variable p points to the string literal "hello!\n", and the two assignments below try to modify that string literal. What does this program do? According to section 2.14.5 paragraph 11 of the C++ standard, it invokes undefined behavior:

The effect of attempting to modify a string literal is undefined.

I can hear people screaming "But wait, I can compile this no problem and get the output yellow" or "What do you mean undefined, string literals are stored in read-only memory, so the first assignment attempt results in a core dump". This is exactly the problem with undefined behavior. Basically, the standard allows anything to happen once you invoke undefined behavior (even nasal demons). If there is a "correct" behavior according to your mental model of the language, that model is simply wrong; The C++ standard has the only vote, period.

Other examples of undefined behavior include accessing an array beyond its bounds, dereferencing the null pointer, accessing objects after their lifetime ended or writing allegedly clever expressions like i++ + ++i.

Section 1.9 of the C++ standard also mentions undefined behavior's two less dangerous brothers, unspecified behavior and implementation-defined behavior:

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine.

Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.

Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.

Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior.end note ]

Specifically, section 1.3.24 states:

Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

What can you do to avoid running into undefined behavior? Basically, you have to read good C++ books by authors who know what they're talking about. Avoid internet tutorials. Avoid bullschildt.

Undefined vs. Unspecified vs. Implementation-defined behavior [duplicate]

In short:

  • Undefined behaviour: this is not okay to do
  • Unspecified behaviour: this is okay to do, but the result could be anything*
  • Implementation-defined behaviour: this is okay to do, the result could be anything* but the compiler manual should tell you

Or, in quotes from the C++ standard (N4659 section 3, Terms and Definitions):

3.28 Undefined behavior: behavior for which this International Standard imposes no requirements

3.29 Unspecified behavior: behavior, for a well-formed program construct and correct data, that depends on the implementation

3.12 Implementation-defined behavior: behavior, for a well-formed program construct and correct data, that depends on the implementation and
that each implementation documents


EDIT: *As pointed out by M.M in the comments, saying that the result of unspecified behaviour could be anything is not quite right. In fact as the standard itself points out, in a note for paragraph 3.29

The range of possible behaviors is usually delineated by this International Standard.

So in practise you have some idea of what the possible results are, but what exactly will happen depends on your compiler/compiler flags/platform/etc.

Is Implementation defined behaviour an undefined behaviour

No, implementation defined behaviour is not undefined behaviour.

Paragraph 5 refers to strictly conforming programs:


  1. A strictly conforming program shall use only those features of the language and library
    specified in this International Standard. 2) It shall not produce output dependent on any
    unspecified, undefined, or implementation-defined behavior, and shall not exceed any
    minimum implementation limit.

Paragraph 7 and its footnote explain that there are 2 levels of conformance:


  1. A conforming program is one that is acceptable to a conforming implementation. 4)

4) Strictly conforming programs are intended to be maximally portable among conforming
implementations. Conforming programs may depend upon nonportable features of a conforming
implementation.

Program with implementation defined behaviour is simply conforming program, but not strictly conforming program.

Undefined/Unspecified/Implementation-defined behaviour warnings?

It all boils down to

  • Quality of Implementation: the more accurate and useful the warnings are, the better it is. A compiler that always printed: "This program may or may not invoke undefined behavior" for every program, and then compiled it, is pretty useless, but is standards-compliant. Thankfully, no one writes compilers such as these :-).

  • Ease of determination: a compiler may not be easily able to determine undefined behavior, unspecified behavior, or implementation-defined behavior. Let's say you have a call stack that's 5 levels deep, with a const char * argument being passed from the top-level, to the last function in the chain, and the last function calls printf() with that const char * as the first argument. Do you want the compiler to check that const char * to make sure it is correct? (Assuming that the first function uses a literal string for that value.) How about when the const char * is read from a file, but you know that the file will always contain valid format specifier for the values being printed?

  • Success rate: A compiler may be able to detect many constructs that may or may not be undefined, unspecified, etc.; but with a very low "success rate". In that case, the user doesn't want to see a lot of "may be undefined" messages—too many spurious warning messages may hide real warning messages, or prompt a user to compile at "low-warning" setting. That is bad.

For your particular example, gcc gives a warning about "may be undefined". It even warns for printf() format mismatch.

But if your hope is for a compiler that issues a diagnostic for all undefined/unspecified cases, it is not clear if that should/can work.

Let's say you have the following:

#include <stdio.h>
void add_to(int *a, int *b)
{
*a = ++*b;
}

int main(void)
{
int i = 42;
add_to(&i, &i); /* bad */
printf("%d\n", i);
return 0;
}

Should the compiler warn you about *a = ++*b; line?

As gf says in the comments, a compiler cannot check across translation units for undefined behavior. Classic example is declaring a variable as a pointer in one file, and defining it as an array in another, see comp.lang.c FAQ 6.1.

Unspecified, undefined and implementation defined behavior WIKI for C

C standards define UsB, UB and IDB in a way that can be summarized as follows:

Unspecified Behavior (UsB)

This is a behavior for which the standard gives some alternatives among which the implementation must choose, but it doesn't mandate how and when the choice is to be made. In other words, the implementation must accept user code triggering that behavior without erroring out and must comply with one of the alternatives given by the standard.

Be aware that the implementation is not required to document anything about the choices made. These choices may also be non-deterministic or dependent (in an undocumented way) on compiler options.

To summarize: the standard gives some possibilities among which to choose, the implementation chooses when and how the specific alternative is selected and applied.

Note that the standard may provide a really large number of alternatives. The typical example is the initial value of local variables that are not explicitly initialized. The standard says that this value is unspecified as long as it is a valid value for the variable's data type.

To be more specific consider an int variable: an implementation is free to choose any int value, and this choice can be completely random, non-deterministic or be at the mercy of the whims of the implementation, which is not required to document anything about it. As long as the implementation stays within the limits stated by the standard this is ok and the user cannot complain.

Undefined Behavior (UB)

As the naming indicates this is a situation in which the C standard doesn't impose or guarantee what the program would or should do. All bets are off. Such a situation:

  • renders a program either erroneous or nonportable

  • doesn't require absolutely anything from the implementation

This is a really nasty situation: as long as there is a piece of code that has undefined behavior, the entire program is considered erroneous and the implementation is allowed by the standard to do everything.

In other words, the presence of a cause of UB allows the implementation to completely ignore the standard, as long as the program triggering the UB is concerned.

Note that the actual behavior in this case may cover an unlimited range of possibilities, the following is by no means an exhaustive list:

  • A compile-time error may be issued.
  • A run-time error may be issued.
  • The problem is completely ignored (and this may lead to program bugs).
  • The compiler silently throws the UB-code away as an optimization.
  • Your hard disk may be formatted.
  • Your computer may erase your bank account and ask your girlfriend for a date.

I hope the last two (half-serious) items can give you the right gut-feeling about the nastiness of UB. And even though most implementations will not insert the necessary code to format you hard drive, real compilers do optimize!

Terminology Note: Sometimes people argue that some piece of code which the standard deems a source of UB in their implementation/system/environment work in a documented way, therefore it cannot be really UB. This reasoning is wrong, but it is a common (and somewhat understandable) misunderstanding: when the term UB (and also UsB and IDB) is used in a C context it is meant as a technical term whose precise meaning is defined by the standard(s). In particular the word "undefined" loses its everyday meaning. Therefore it doesn't make sense to show examples where erroneous or nonportable programs produce "well-defined" behavior as counterexamples. If you try, you really miss the point. UB means that you lose all the guarantees of the standard. If your implementation provides an extension then your guarantees are only those of your implementation. If you use that extension your program is no more a conforming C program (in a sense, it is no more a C program, since it doesn't follow the standard any longer!).

Usefulness of undefined behavior

A common question about UB is something on these lines: "If UB is so nasty, why does not the standard mandate that an implementation issues an error when faced with UB?"

First, optimizations. Allowing implementations not to check for possible causes of UB allows lots of optimizations that make a C program extremely efficient. This is one of the features of C, although it makes C a source of many pitfalls for beginners.

Second, the existence of UB in the standards allows a conforming implementation to provide extensions to C without being deemed non-conforming as a whole.

As long as an implementation behaves as mandated for a conforming program, it is itself conforming, although it may provide non-standard facilities that may be useful on specific platforms. Of course the programs using those facilities will be nonportable and will rely on documented UB, i.e. behavior that is UB according to the standard, but that an implementation documents as an extension.

Implementation-defined Behavior (IDB)

This is a behavior that can be described in a way similar to UsB: the standard provides some alternatives and the implementation choose one, but the implementation is required to document exactly how the choice is made.

This means that a user reading her compiler's documentation must be given enough information to predict exactly what will happen in the specific case.

Note that an implementation that doesn't fully document an IDB cannot be deemed conforming. A conforming implementation must document exactly what happens in any case that the standard declares IDB.



Examples of unspecified behavior

Order of evaluation

Function arguments

The order of evaluation for function arguments is unspecified EXP30-C.

For instance, in c(a(), b()); it is unspecified whether the function a is called before or after b. The only guarantee is that both are called before the c function.



Examples of undefined behavior

Pointers

Dereferencing of null pointer

Null pointers are used to signal that a pointer does not point to valid memory. As such, it does not make much sense to try to read or write to memory via a null pointer.

Technically, this is undefined behaviour. However, since this is a very common source of bugs, most C-environments ensure that most attempts to dereference a null pointer will immediately crash the program (usually killing it with a segmentation fault). This guard is not perfect due to the pointer arithmetic involved in references to arrays and/or structures, so even with modern tools, dereferencing a null pointer may format your hard drive.

Dereferencing of uninitialized pointer

Just like null pointers, dereferencing a pointer before explitely setting its value is UB. Unlike for null pointers, most environments do not provide any safety net against this sort of error, except that compiler can warn about it. If you compile your code anyway, you'll are likely to experience the whole nastiness of UB.

Dereferencing of invalid pointers

An invalid pointer is a pointer that contains an address that is not within any allocated memory area. Common ways to create invalid pointers is to call free() (after the call, the pointer will be invalid, which is pretty much the point of calling free()), or to use pointer arithmetic to get an address that is beyond the limits of an allocated memory block.

This is the most evil variant of pointer dereferencing UB: There is no safety net, there is no compiler warning, there is just the fact that the code may do anything. And commonly, it does: Most malware attacks use this kind of UB behaviour in programs to make the programs behave as they want them to behave (like installing a trojan, keylogger, encrypting your hard drive etc.). The possibility of a formatted hard drive becomes very real with this kind of UB!

Casting away constness

If we declare an object as const, we give a promise to the compiler that we will never change the value of that object. In many contexts compilers will spot such an invalid modification and shout at us. But if we cast the constness away as in this snippet:

int const a = 42;
...
int* ap0 = &a; //< error, compiler will tell us
int* ap1 = (int*)&a; //< silences the compiler
...
*ap1 = 43; //< UB ==> program crash?

the compiler might not be able to track this invalid access, compile the code to an executable and only at run time the invalid access will be detected and lead to a program crash.

category 2

put a title here!

put your explanation here!



Examples of implementation-defined behavior

category 1

put a title here!

put your explanation here!

Implementation-defined behavior in C

The definition of implementation-defined behavior in C is when something is left for the compiler to decide, and the compiler documents which choice it made.

There are hundreds of such cases in the language. The standard contains a summary of most of them in Annex J.3, which is ~15 pages long.

The specific example int i; i >> 3 is undefined behavior since the variable isn't initialized.

The specific example int i=0; i >> 3 is implementation-defined because the standard says so. C17 6.5.7/5:

The result of E1 >> E2 is E1 right-shifted E2 bit positions. /--/ If E1 has a signed type and a negative value, the resulting value is implementation-defined.

In this particular case, it depends on whether the compiler picks an arithmetic shift or a logical shift instruction from the CPU instruction set. Meaning that the standard doesn't disfavour architectures that lack an arithmetic shift. Though in practice, the vast majority of CPUs are capable of doing arithmetic shift, even RISC ones.

Why is calling the main function supposedly undefined behavior (UB)

I think your analysis is correct: calls to main are ill-formed.

You have to pass the -pedantic flag to make GCC and Clang conform. In that case, Clang says

warning: ISO C++ does not allow 'main' to be used by a program [-Wmain]

and GCC says

warning: ISO C++ forbids taking address of function '::main' [-Wpedantic]

But they allow calls to main as an extension. The standard permits such an extension, since it doesn't change the meaning of any conforming programs.

Is -15; unspecified behavior in C?

Both are correct. Implementation defined behavior is a particular type of unspecified behavior.

Citing section 3.4.1 of the C standard which defines "implementation-defined behavior":

1 implementation-defined behavior

unspecified behavior where each implementation documents how the choice is made

2 EXAMPLE An example of implementation-defined behavior is the propagation of the high-order bit
when a signed integer is shifted right.

From section 3.4.4 defining "unspecified behavior":

1 unspecified behavior

use of an unspecified value, or other behavior where this
International Standard provides two or more possibilities and imposes
no further requirements on which is chosen in any instance

2 EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.

As for GCC, you'll always get the same answer because the operation is implementation defined. It implements right shift of negative numbers via sign extension

From the GCC documentation:

The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5).

Bitwise operators act on the representation of the value including
both the sign and value bits, where the sign bit is considered
immediately above the highest-value value bit. Signed >> acts on
negative numbers by sign extension.

As an extension to the C language, GCC does not use the latitude given
in C99 and C11 only to treat certain aspects of signed << as
undefined. However, -fsanitize=shift (and -fsanitize=undefined) will
diagnose such cases. They are also diagnosed where constant
expressions are required.



Related Topics



Leave a reply



Submit