Uninitialized Variable Behaviour in C++

What happens to uninitialized variables in C/C++?

Q.1) What happens if an uninitialized variable is used in say an operation? Will it crash/ will the code fail to compile?

Many compilers will warn you about code that improperly uses the value of an uninitialized variable. Many compilers have an option that says "treat warnings as errors". So depending on the compiler you're using and the option flags you invoke it with, the code might fail to compile, although we can't say that it will fail to compile.

If the code does compile, and you try to run it, it's obviously impossible to predict what will happen. In most cases the variable will start out containing an "indeterminate" value. Whether that indeterminate value will cause your program to work correctly, or work incorrectly, or crash, is anyone's guess. If the variable is an integer and you try to do some math on it, you'll probably just get a weird answer. But if the variable is a pointer and you try to indirect on it, you're quite likely to get a crash.

It's often said that uninitialized local variables start out containing "random garbage", but that can be misleading, as evidenced by the number of people who post questions here pointing out that, in their program where they tried it, the value wasn't random, but was always 0 or was always the same. So I like to say that uninitialized local variables never start out holding what you expect. If you expected them to be random, you'll find that (at least on any given day) they're repeatable and predictable. But if you expect them to be predictable (and, god help you, if you write code that depends on it), then by jingo, you'll find that they're quite random.

Whether use of an uninitialized variable makes your program formally undefined turns out to be a complicated question. But you might as well assume that it does, because it's a case you want to avoid just as assiduously as you avoid any other dangerous, undefined behavior.

See this old question and this other old question for more (much more!) information on the fine distinctions between undefined and indeterminate behavior in this case.

Q.2) Will C and C++ standards differ in how they treat an uninitialized variable?

They might differ. As I alluded to above, and at least in C, it turns out that not all uses of uninitialized local variables are formally undefined. (Some are merely "indeterminate".) But the passages quoted from the C++ standards by other answers here make it sound like it's undefined there all the time. Again, for practical purposes, the question probably doesn't matter, because as I said, you'll want to avoid it no matter what.

Q.3) Regarding similar queries, how and where can I find an 'official' answer? Is it practical for an amateur to look up the C and C++ standards?

It is not always easy to obtain copies of the standards (let alone official ones, which often cost money), and the standards can be difficult to read and to properly interpret, but yes, given effort, anyone can obtain, read, and attempt to answer questions using the standards. You might not always make the correct interpretation the first time (and you may therefore need to ask for help), but I wouldn't say that's a reason not to try. (For one thing, anyone can read any document and end up not making the correct interpretation the first time; this phenomenon is not limited to amateur programmers reading complex language standards documents!)

Why can we use uninitialized variables in C++?

C++ gives you the ability to shoot yourself in the foot.

Initialising an integral type variable to 0 is a machine instruction typically of the form

REG XOR REG

Its presence is less than satisfactory if you want to initialise it to something else. That's abhorrent to a language that prides itself on being the fastest. Your assertion that integers are initialised to zero is not correct.

The behaviour of using an uninitialised variable in C++ is undefined.

(Why) is using an uninitialized variable undefined behavior?

Yes this behavior is undefined but for different reasons than most people are aware of.

First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.

What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.

Edit: The relevant phrase of the standard is 6.3.2.1p2:

If the lvalue designates an object of automatic storage duration that
could have been declared with the register storage class (never had
its address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been performed prior
to use), the behavior is undefined.

And to make it clearer, the following code is legal under all circumstances:

unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
  • Here the addresses of a and b are taken, so their value is just
    indeterminate.
  • Since unsigned char never has trap representations
    that indeterminate value is just unspecified, any value of unsigned char could
    happen.
  • At the end a must hold the value 0.

Edit2: a and b have unspecified values:

3.19.3 unspecified value

valid value of the relevant type where this International Standard imposes no requirements on which value
is chosen in any instance

Is reading an uninitialized value always an undefined behaviour? Or are there exceptions to it?

Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:

Except when it is the operand of the sizeof operator, the
_Alignof operator, the unary & operator, the ++ operator, the
-- operator, or the left operand of the . operator or an
assignment operator, an lvalue that does not have array type
is converted to the value stored in the designated object
(and is no longer an lvalue); this is called lvalue
conversion
. If the lvalue has qualified type, the value has
the unqualified version of the type of the lvalue; additionally,
if the lvalue has atomic type, the value has the non-atomic version
of the type of the lvalue; otherwise, the value has the
type of the lvalue. If the lvalue has an incomplete type and does
not have array type, the behavior is undefined. If the lvalue
designates an object of automatic storage duration that could
have been declared with the register storage class (never had its
address taken), and that object is uninitialized (not declared
with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined.

In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.

This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.

There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:

5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic
storage duration
, as do some compound literals. The result of
attempting to indirectly access an object with automatic storage
duration from a thread other than the one with which the object is
associated is implementation-defined.

6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block
with which it is associated until execution of that block ends in any
way.
(Entering an enclosed block or calling a function
suspends, but does not end,execution of the current block.) If
the block is entered recursively, a new instance of the object is
created each time. The initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration or compound
literal is reached in the execution of the block; otherwise,
the value becomes indeterminate each time the declaration is reached

So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.

The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:

int a;
printf("%p\n", (void *)&a);
printf("%d\n", a);

In this case it is not undefined behavior if:

  • The implementation does not have trap representations for the given type, OR
  • The value chosen for a happens to not be a trap representation.

In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.

Why is it safe to use the address of an uninitialized variable in c but not an uninitialized pointer?

When you wrote

char letter;
printf("%p\n", &letter);

you declared a variable called letter. It has a well-defined location (or address). The only thing we don't know is which char value is in it -- that's either indeterminate or undefined, depending on who you ask. So if you had tried to do printf("%c\n", letter), that might have gotten you into trouble, because that would try to print the undefined/indeterminate value.

But when you wrote

char *letter1;
printf("%p\n", letter1); //program crashes

that's completely different. letter1 is a variable of type pointer-to-char. As before, it has a well-defined location, and an indeterminate initial value. But what's confusing here is that the value it doesn't have is (or would be) also an address.

If you wrote

printf("%p\n", &letter1);

you'd print the address of letter1, and as I said, that's well-defined. But you tried to print

printf("%p\n", letter1);

and there you try to print the address in letter1, which is a much bigger problem.

(I wouldn't expect an actual crash, though -- in practice I'd merely expect a "random value". I wouldn't expect a crash unless you tried to do printf("%c\n", *letter1).)

One more thing: Taking the address of an uninitialized variable can't be undefined, because plenty of well-defined programs do just that!
Taking an address of an uninitialized variable and passing it to a function can be a good way of assigning a value to a variable. If you have a function that returns a value "by reference", you're probably going to pass it the address of a variable, and it will often be uninitialized, like this:

char *p;
int n = strtol("23skidoo", &p, 10);
printf("%d %s\n", n, p);

Footnote: I wrote that the initial value was "either indeterminate or undefined, depending on who you ask", and that alludes to a tremendous subtlety which I only learned about a couple of days ago, which is that the indeterminacy/undefinedness of the initial values of local variables like these can evidently depend on whether they do or might have their addresses taken. There's sort of a Heisenberg -- or maybe Schrödinger -- uncertainty principle here, where the behavior depends on how closely you attempt to observe it. If your program actually did crash when you tried to print the value of letter1, it might not crash if you changed it to printf("%p %p\n", &letter1, letter1);.

Using uninitialized variable without invoking undefined behavior

In this case, because x has had it's address taken, the behavior is not strictly undefined. The value of x at this point is indeterminate. This means the value is either a trap representation or unspecified.

If x happens to contain a trap representation then the behavior is undefined, otherwise the value is unspecified which means that any valid value could be printed.

Also, most systems you're likely to come across don't have any padding bits in integer types, meaning there are no trap representations on that implementation and the value will always be unspecified.

The relevant passages from the C standard:

Section 3.19:

3.19.2

1 indeterminate value either an unspecified value or a trap representation

3.19.3

1 unspecified value valid value of the relevant type where this International Standard imposes no requirements on
which value is chosen in any instance

2 NOTE An unspecified value cannot be a trap representation.

3.19.4

1 trap representation an object representation that need not represent a value of the object type

Section 6.7.9p10:

If an object that has automatic storage duration is not
initialized explicitly, its value is indeterminate.

Are uninitialized local variables in C static by default?

No, they are not static by default. In principle, the initial value can be anything at all. Using the value could even be undefined behaviour. In practice, the compiler picks a memory location for the variable on the stack, and the variable's initial value is whatever happens to already be in that memory location.

Since you don't run any other code between the first up() and the second up(), in practice your program is likely to pick the same location twice and therefore it still has the previous value. If you called another function in between, that function's local variables would go in the same space previously used by up()'s local variables, which would overwrite the value from the first up().

You certainly can't rely on it. Even if you don't call any other functions in-between, the compiler might add one "secretly" (for various reasons). Or the compiler may decide to adjust the stack between the two calls to up so each call might get a different stack location for its local variables.

You also aren't guaranteed that the first value is 0. Because it's whatever happens to be at that memory location already, it could be something left over from a previous function. main isn't the first function that gets called; there is some function in the standard library which does set-up work before it calls main.

Uninitialized variable behaviour in C++

How's this possible when the program always assign a free memory
location to a variable? How could it be something rather than zero?

Let's take a look at an example practical implementation.

Let's say it utilizes stack to keep local variables.

void
foo(void)
{
int foo_var = 42;
}

void
bar(void)
{
int bar_var;
printf("%d\n", bar_var);
}

int
main(void)
{
bar();
foo();
bar();
}

Totally broken code above illustrates the point. After we call foo, certain location on the stack where foo_var was placed is set to 42. When we call bar, bar_var occupies that exact location. And indeed, executing the code results in printing 0 and 42, showing that bar_var value cannot be relied upon unless initialized.

Now it should be clear that local variable initialisation is required. But could main be an exception? Is there anything which could play with the stack and in result give us a non-zero value?

Yes. main is not the first function executed in your program. In fact there is tons of work required to set everything up. Any of this work could have used the stack and leave some non-zeros on it. Not only you can't expect the same value on different operating systems, it may very well suddenly change on the very system you are using right now. Interested parties can google for "dynamic linker".

Finally, the C language standard does not even have the term stack. Having a "place" for local variables is left to the compiler. It could even get random crap from whatever happened to be in a given register. It really can be totally anything. In fact, if an undefined behaviour is triggered, the compiler has the freedom to do whatever it feels like.



Related Topics



Leave a reply



Submit