Why Do Local Variables Require Initialization, But Fields Do Not

Why do local variables require initialization, but fields do not?

Yuval and David's answers are basically correct; summing up:

  • Use of an unassigned local variable is a likely bug, and this can be detected by the compiler at low cost.
  • Use of an unassigned field or array element is less likely a bug, and it is harder to detect the condition in the compiler. Therefore the compiler makes no attempt to detect the use of an uninitialized variable for fields, and instead relies upon the initialization to the default value in order to make the program behavior deterministic.

A commenter to David's answer asks why it is impossible to detect the use of an unassigned field via static analysis; this is the point I want to expand upon in this answer.

First off, for any variable, local or otherwise, it is in practice impossible to determine exactly whether a variable is assigned or unassigned. Consider:

bool x;
if (M()) x = true;
Console.WriteLine(x);

The question "is x assigned?" is equivalent to "does M() return true?" Now, suppose M() returns true if Fermat's Last Theorem is true for all integers less than eleventy gajillion, and false otherwise. In order to determine whether x is definitely assigned, the compiler must essentially produce a proof of Fermat's Last Theorem. The compiler is not that smart.

So what the compiler does instead for locals is implements an algorithm which is fast, and overestimates when a local is not definitely assigned. That is, it has some false positives, where it says "I can't prove that this local is assigned" even though you and I know it is. For example:

bool x;
if (N() * 0 == 0) x = true;
Console.WriteLine(x);

Suppose N() returns an integer. You and I know that N() * 0 will be 0, but the compiler does not know that. (Note: the C# 2.0 compiler did know that, but I removed that optimization, as the specification does not say that the compiler knows that.)

All right, so what do we know so far? It is impractical for locals to get an exact answer, but we can overestimate not-assigned-ness cheaply and get a pretty good result that errs on the side of "make you fix your unclear program". That's good. Why not do the same thing for fields? That is, make a definite assignment checker that overestimates cheaply?

Well, how many ways are there for a local to be initialized? It can be assigned within the text of the method. It can be assigned within a lambda in the text of the method; that lambda might never be invoked, so those assignments are not relevant. Or it can be passed as "out" to anothe method, at which point we can assume it is assigned when the method returns normally. Those are very clear points at which the local is assigned, and they are right there in the same method that the local is declared. Determining definite assignment for locals requires only local analysis. Methods tend to be short -- far less than a million lines of code in a method -- and so analyzing the entire method is quite quick.

Now what about fields? Fields can be initialized in a constructor of course. Or a field initializer. Or the constructor can call an instance method that initializes the fields. Or the constructor can call a virtual method that initailizes the fields. Or the constructor can call a method in another class, which might be in a library, that initializes the fields. Static fields can be initialized in static constructors. Static fields can be initialized by other static constructors.

Essentially the initializer for a field could be anywhere in the entire program, including inside virtual methods that will be declared in libraries that haven't been written yet:

// Library written by BarCorp
public abstract class Bar
{
// Derived class is responsible for initializing x.
protected int x;
protected abstract void InitializeX();
public void M()
{
InitializeX();
Console.WriteLine(x);
}
}

Is it an error to compile this library? If yes, how is BarCorp supposed to fix the bug? By assigning a default value to x? But that's what the compiler does already.

Suppose this library is legal. If FooCorp writes

public class Foo : Bar
{
protected override void InitializeX() { }
}

is that an error? How is the compiler supposed to figure that out? The only way is to do a whole program analysis that tracks the initialization static of every field on every possible path through the program, including paths that involve choice of virtual methods at runtime. This problem can be arbitrarily hard; it can involve simulated execution of millions of control paths. Analyzing local control flows takes microseconds and depends on the size of the method. Analyzing global control flows can take hours because it depends on the complexity of every method in the program and all the libraries.

So why not do a cheaper analysis that doesn't have to analyze the whole program, and just overestimates even more severely? Well, propose an algorithm that works that doesn't make it too hard to write a correct program that actually compiles, and the design team can consider it. I don't know of any such algorithm.

Now, the commenter suggests "require that a constructor initialize all fields". That's not a bad idea. In fact, it is such a not-bad idea that C# already has that feature for structs. A struct constructor is required to definitely-assign all fields by the time the ctor returns normally; the default constructor initializes all the fields to their default values.

What about classes? Well, how do you know that a constructor has initialized a field? The ctor could call a virtual method to initialize the fields, and now we are back in the same position we were in before. Structs don't have derived classes; classes might. Is a library containing an abstract class required to contain a constructor that initializes all its fields? How does the abstract class know what values the fields should be initialized to?

John suggests simply prohibiting calling methods in a ctor before the fields are initialized. So, summing up, our options are:

  • Make common, safe, frequently used programming idioms illegal.
  • Do an expensive whole-program analysis that makes the compilation take hours in order to look for bugs that probably aren't there.
  • Rely upon automatic initialization to default values.

The design team chose the third option.

Why must local variables, including primitives, always be initialized in Java?

Basically, requiring a variable to be assigned a value before you read it is a Good Thing. It means you won't accidentally read something you didn't intend to. Yes, variables could have default values - but isn't it better for the compiler to be able to catch your bug instead, if it can prove that you're trying to read something which might not have been assigned yet? If you want to give a local variable a default value, you can always assign that explicitly.

Now that's fine for local variables - but for instance and static variables, the compiler has no way of knowing the order in which methods will be called. Will a property "setter" be called before the "getter"? It has no way of knowing, so it has no way of alerting you to the danger. That's why default values are used for instance/static variables - at least then you'll get a known value (0, false, null etc) instead of just "whatever happened to be in memory at the time." (It also removes the potential security issue of reading sensitive data which hadn't been explicitly wiped.)

There was a question about this very recently for C#... - read the answers there as well, as it's basically the same thing. You might also find Eric Lippert's recent blog post interesting; it's at least around the same area, even though it has a somewhat different thrust.

Why Final variable doesn't require initialization in main method in java?

For instance variable level

  • A final variable can be initialized only once.

  • A final variable at class level must be initialized before the end of the constructor.

For local (method) level

  • A final variable at method level can be initialized only once.
  • It must be initialized before it is used

So basically if you don't use a local final variable you can also skip it's initialization.

If the variable is at instance level you have to initialize it in the definition or in the costructor body.

In your code you have an instance variable final int b that is never initialized so you have an error.

You have also a local variable final int a that is never used. So you haven't an error for that variable.

Why must local variables have initial values?

Fields are automatically initialized to the logical zero for the type; this is implicit. Variables must obey "definite assignment", so must be assigned before they can be read.

ECMA 334v4

§17.4.4 Field initialization

The initial value of a field, whether
it be a static field or an instance
field, is the default value (§12.2) of
the field’s type. It is not possible
to observe the value of a field before
this default initialization has
occurred, and a field is thus never
"uninitialized".

and

§12. Variables

...
A variable shall be definitely assigned (§12.3) before its
value can be obtained.
...

Initialization of instance fields vs. local variables

For local variables, the compiler has a good idea of the flow - it can see a "read" of the variable and a "write" of the variable, and prove (in most cases) that the first write will happen before the first read.

This isn't the case with instance variables. Consider a simple property - how do you know if someone will set it before they get it? That makes it basically infeasible to enforce sensible rules - so either you'd have to ensure that all fields were set in the constructor, or allow them to have default values. The C# team chose the latter strategy.

Java: Why am I required to initialize a primitive local variable?

Because it's a local variable. This is why nothing is assigned to it :

Local variables are slightly different; the compiler never assigns a
default value to an uninitialized local variable. If you cannot
initialize your local variable where it is declared, make sure to
assign it a value before you attempt to use it. Accessing an
uninitialized local variable will result in a compile-time error.

Edit: Why does Java raise this compilation error ?
If we look at the IdentifierExpression.java class file, we will find this block :

...
if (field.isLocal()) {
LocalMember local = (LocalMember)field;
if (local.scopeNumber < ctx.frameNumber && !local.isFinal()) {
env.error(where, "invalid.uplevel", id);
}
if (!vset.testVar(local.number)) {
env.error(where, "var.not.initialized", id);
vset.addVar(local.number);
}
local.readcount++;
}
...

As stated (if (!vset.testVar(local.number)) {), the JDK checks (with testVar) if the variable is assigned (Vset's source code where we can find testVar code). If not, it raises the error var.not.initialized from a properties file :

...
javac.err.var.not.initialized=\
Variable {0} may not have been initialized.
...

Source

Why java is asking to initialize the variables when it is local

Local variables are used mostly for intermediate calculations whereas instance variables are supposed to carry data for calculations for future and intermediate as well. Java doesnt forces to initialize instance variable and allows default value but for local variables its the developers call to adssign the value. So to avoid mistakes you need to initialize local variables.

Why C# local variables must be initialized?

The book is mostly correct when it comes to VB, but it fails to mention the difference between VB and C# in this case.

In VB all local variables are automatically initialised:

Sub Test()
Dim x As Integer
MessageBox.Show(x.ToString()) 'shows "0"
End Sub

While in C# local variables are not initialised, and the compiler won't let you use them until they are:

void Test() {
int x;
MessageBox.Show(x.ToString()); // gives a compiler error
}

Also, it's not clear whether the quote from the book is actually talking about local variables or class member variables. Class member variables are always initialised when the class instance is created, both in VB and C#.

The book is wrong when it says that "Value types have an implicit constructor". That is simply not true. A value type is initialised to its default value (if it's initialised), and there is no call to a constructor when that happens.

Why Java initializing only class variables by default but not local variables?

Static/Non-static fields that are not primitives, like your Node, are initialized at null by default.
Static/Non-static fields that are primitive gets their default values.

There's also another case where some variables are initialized with default: when you instantiate an array. Each cell represents has default value, regarding the type:

  • 0 for int
  • null for Integer
  • etc.

However, in a local method, compiler does not assign default value to local variables.

That's why your IDE warns about: "may not be initialized!".

To understand why, you may be interested in this post.



Related Topics



Leave a reply



Submit