Declaring Variables Inside or Outside of a Loop

Declaring variables inside loops, good practice or bad practice?

This is excellent practice.

By creating variables inside loops, you ensure their scope is restricted to inside the loop. It cannot be referenced nor called outside of the loop.

This way:

  • If the name of the variable is a bit "generic" (like "i"), there is no risk to mix it with another variable of same name somewhere later in your code (can also be mitigated using the -Wshadow warning instruction on GCC)

  • The compiler knows that the variable scope is limited to inside the loop, and therefore will issue a proper error message if the variable is by mistake referenced elsewhere.

  • Last but not least, some dedicated optimization can be performed more efficiently by the compiler (most importantly register allocation), since it knows that the variable cannot be used outside of the loop. For example, no need to store the result for later re-use.

In short, you are right to do it.

Note however that the variable is not supposed to retain its value between each loop. In such case, you may need to initialize it every time. You can also create a larger block, encompassing the loop, whose sole purpose is to declare variables which must retain their value from one loop to another. This typically includes the loop counter itself.

{
int i, retainValue;
for (i=0; i<N; i++)
{
int tmpValue;
/* tmpValue is uninitialized */
/* retainValue still has its previous value from previous loop */

/* Do some stuff here */
}
/* Here, retainValue is still valid; tmpValue no longer */
}

For question #2:
The variable is allocated once, when the function is called. In fact, from an allocation perspective, it is (nearly) the same as declaring the variable at the beginning of the function. The only difference is the scope: the variable cannot be used outside of the loop. It may even be possible that the variable is not allocated, just re-using some free slot (from other variable whose scope has ended).

With restricted and more precise scope come more accurate optimizations. But more importantly, it makes your code safer, with less states (i.e. variables) to worry about when reading other parts of the code.

This is true even outside of an if(){...} block. Typically, instead of :

    int result;
(...)
result = f1();
if (result) then { (...) }
(...)
result = f2();
if (result) then { (...) }

it's safer to write :

    (...)
{
int const result = f1();
if (result) then { (...) }
}
(...)
{
int const result = f2();
if (result) then { (...) }
}

The difference may seem minor, especially on such a small example.
But on a larger code base, it will help : now there is no risk to transport some result value from f1() to f2() block. Each result is strictly limited to its own scope, making its role more accurate. From a reviewer perspective, it's much nicer, since he has less long range state variables to worry about and track.

Even the compiler will help better : assuming that, in the future, after some erroneous change of code, result is not properly initialized with f2(). The second version will simply refuse to work, stating a clear error message at compile time (way better than run time). The first version will not spot anything, the result of f1() will simply be tested a second time, being confused for the result of f2().

Complementary information

The open-source tool CppCheck (a static analysis tool for C/C++ code) provides some excellent hints regarding optimal scope of variables.

In response to comment on allocation:
The above rule is true in C, but might not be for some C++ classes.

For standard types and structures, the size of variable is known at compilation time. There is no such thing as "construction" in C, so the space for the variable will simply be allocated into the stack (without any initialization), when the function is called. That's why there is a "zero" cost when declaring the variable inside a loop.

However, for C++ classes, there is this constructor thing which I know much less about. I guess allocation is probably not going to be the issue, since the compiler shall be clever enough to reuse the same space, but the initialization is likely to take place at each loop iteration.

Declaring variables inside or outside of a loop

The scope of local variables should always be the smallest possible.

In your example I presume str is not used outside of the while loop, otherwise you would not be asking the question, because declaring it inside the while loop would not be an option, since it would not compile.

So, since str is not used outside the loop, the smallest possible scope for str is within the while loop.

So, the answer is emphatically that str absolutely ought to be declared within the while loop. No ifs, no ands, no buts.

The only case where this rule might be violated is if for some reason it is of vital importance that every clock cycle must be squeezed out of the code, in which case you might want to consider instantiating something in an outer scope and reusing it instead of re-instantiating it on every iteration of an inner scope. However, this does not apply to your example, due to the immutability of strings in java: a new instance of str will always be created in the beginning of your loop and it will have to be thrown away at the end of it, so there is no possibility to optimize there.

EDIT: (injecting my comment below in the answer)

In any case, the right way to do things is to write all your code properly, establish a performance requirement for your product, measure your final product against this requirement, and if it does not satisfy it, then go optimize things. And what usually ends up happening is that you find ways to provide some nice and formal algorithmic optimizations in just a couple of places which make our program meet its performance requirements instead of having to go all over your entire code base and tweak and hack things in order to squeeze clock cycles here and there.

declare variable inside or outside a loop, does it make big difference?

Yes. scope of variable i is different in both cases.

In first case, A variable i declared in a block or function. So, you can access it in the block or function.

In the second case, A variable I declared in a while loop. So, you can access it in while loop only.

Does it make big difference in terms of performance?

No, it will not matter performance-wise where you declare it.

For example 1:

int main()
{
int i, bigNumber;

while(bigNumber--) {
i = 0;
}
}

Assembly:

main:
push rbp
mov rbp, rsp
.L3:
mov eax, DWORD PTR [rbp-4]
lea edx, [rax-1]
mov DWORD PTR [rbp-4], edx
test eax, eax
setne al
test al, al
je .L2
mov DWORD PTR [rbp-8], 0
jmp .L3
.L2:
mov eax, 0
pop rbp
ret

Example 2:

int main()
{
int bigNumber;

while(bigNumber--) {
int i;
i = 0;
}
}

Assembly:

main:
push rbp
mov rbp, rsp
.L3:
mov eax, DWORD PTR [rbp-4]
lea edx, [rax-1]
mov DWORD PTR [rbp-4], edx
test eax, eax
setne al
test al, al
je .L2
mov DWORD PTR [rbp-8], 0
jmp .L3
.L2:
mov eax, 0
pop rbp
ret

Both generate the same assembly code.

Declaring variables inside or outside in a for-in loop

Those two snippets of code do exactly the same thing (and that's the case in most language such as C, C++ and C# amongst others). If the variable was redeclared at every iteration, then following your logic, it would also be re-initialized, and would constantly loop over the same object. Your loop would be infinite.

On a side-note, in JavaScript, all variable declarations get pushed to the function scope; this means that you can declare variables anywhere within a function, even within nested loops, and they will only be declared once.

Link to the var documentation

Relevant SO question

Other relevant SO answer

Edit courtesy of @torazaburo:

If you want to declare a variable with a local scope (as in, a variable that will only be defined in the current block such as a for, while or if, you can use the let statement:

let var1 = 123;

It also allows you to override variables with the same name but declared in a higher scope, such as in this example from the docs:

function letTest() {
let x = 1;
if (true) {
let x = 2; // different variable
console.log(x); // 2
}
console.log(x); // 1
}

See the full documentation (and examples) here.

Difference between declaring variables before or in loop?

Which is better, a or b?

From a performance perspective, you'd have to measure it. (And in my opinion, if you can measure a difference, the compiler isn't very good).

From a maintenance perspective, b is better. Declare and initialize variables in the same place, in the narrowest scope possible. Don't leave a gaping hole between the declaration and the initialization, and don't pollute namespaces you don't need to.

Is it more efficient to declare a variable inside a loop, or just to reassign it?

If you declare a variable outside of the loop and not use it past the loop, the compiler will move the declaration inside the loop.

That means there is no reason to compare efficiency here, since you end up with the same exact code that the JVM will run for the two approaches.

So the following code:

int sum;
for(int i=0; i<10; i++)
{
sum=0;
}

... becomes this after compilation:

for(int i = 0; i < 10; i++)
{
int sum = 0;
}

Should variables be declared inside the loop or outside the loop in java

If there are e.g. 100000 items then 100000 objects will be created in Approach One and each object will have a reference (myObject) to it so they are not eligible for GC?

No, from Garbage Collector's point of view both the approaches work the same i.e. no memory is leaked. With approach two, as soon as the following statement runs

myObject = new MyObject();

the previous MyObject that was being referenced becomes an orphan (unless while using that Object you passed it around, say, to another method where that reference was saved) and is eligible for garbage collection.

The difference is that once the loop runs out you would have the last instance of MyObject still reachable through the myObject reference originally created outside the loop.


Does GC know when references go out of scope during the loop execution or it can only know at the end of method?

First of all there's only one reference, not references. It's the objects that are getting unreferenced in the loop. Secondly, the garbage collection doesn't kick in spontaneously. So forget the loop, it may not even happen when the method exits.

Notice that I said, orphan objects become eligible for gc, not that they get collected immediately. Garbage collection never happens in real time, it happens in phases. In the mark phase, all the objects that are not reachable through a live thread anymore are marked for deletion. Then in the sweep phase, memory is reclaimed and additionally compacted much like defragmenting a hard drive. So, it works more like a batch rather than piecemeal operations.

GC isn't bothered about scopes or methods as such. It only looks for unreferenced objects and it does so when it feels like doing it. You can't force it. The only thing that you can be sure of is that GC would run if the JVM is running out of memory but you can't pin exactly when it would do so.

But, all this does not mean that GC can't kick in while the method executes or even while the loop is running. If you had, say, a Message Processor that processed 10,000 messages every 10 mins or so and then slept in between i.e. the bean waits within the loop, does 10,000 iterations and then waits again; GC would definitely kick into action to reclaim memory even though the method hasn't run to completion yet.



Related Topics



Leave a reply



Submit