Difference Between Declaring Variables Before or in Loop

Difference between declaring variables before or in loop?

Which is better, a or b?

From a performance perspective, you'd have to measure it. (And in my opinion, if you can measure a difference, the compiler isn't very good).

From a maintenance perspective, b is better. Declare and initialize variables in the same place, in the narrowest scope possible. Don't leave a gaping hole between the declaration and the initialization, and don't pollute namespaces you don't need to.

Declaring variables inside loops, good practice or bad practice?

This is excellent practice.

By creating variables inside loops, you ensure their scope is restricted to inside the loop. It cannot be referenced nor called outside of the loop.

This way:

  • If the name of the variable is a bit "generic" (like "i"), there is no risk to mix it with another variable of same name somewhere later in your code (can also be mitigated using the -Wshadow warning instruction on GCC)

  • The compiler knows that the variable scope is limited to inside the loop, and therefore will issue a proper error message if the variable is by mistake referenced elsewhere.

  • Last but not least, some dedicated optimization can be performed more efficiently by the compiler (most importantly register allocation), since it knows that the variable cannot be used outside of the loop. For example, no need to store the result for later re-use.

In short, you are right to do it.

Note however that the variable is not supposed to retain its value between each loop. In such case, you may need to initialize it every time. You can also create a larger block, encompassing the loop, whose sole purpose is to declare variables which must retain their value from one loop to another. This typically includes the loop counter itself.

{
int i, retainValue;
for (i=0; i<N; i++)
{
int tmpValue;
/* tmpValue is uninitialized */
/* retainValue still has its previous value from previous loop */

/* Do some stuff here */
}
/* Here, retainValue is still valid; tmpValue no longer */
}

For question #2:
The variable is allocated once, when the function is called. In fact, from an allocation perspective, it is (nearly) the same as declaring the variable at the beginning of the function. The only difference is the scope: the variable cannot be used outside of the loop. It may even be possible that the variable is not allocated, just re-using some free slot (from other variable whose scope has ended).

With restricted and more precise scope come more accurate optimizations. But more importantly, it makes your code safer, with less states (i.e. variables) to worry about when reading other parts of the code.

This is true even outside of an if(){...} block. Typically, instead of :

    int result;
(...)
result = f1();
if (result) then { (...) }
(...)
result = f2();
if (result) then { (...) }

it's safer to write :

    (...)
{
int const result = f1();
if (result) then { (...) }
}
(...)
{
int const result = f2();
if (result) then { (...) }
}

The difference may seem minor, especially on such a small example.
But on a larger code base, it will help : now there is no risk to transport some result value from f1() to f2() block. Each result is strictly limited to its own scope, making its role more accurate. From a reviewer perspective, it's much nicer, since he has less long range state variables to worry about and track.

Even the compiler will help better : assuming that, in the future, after some erroneous change of code, result is not properly initialized with f2(). The second version will simply refuse to work, stating a clear error message at compile time (way better than run time). The first version will not spot anything, the result of f1() will simply be tested a second time, being confused for the result of f2().

Complementary information

The open-source tool CppCheck (a static analysis tool for C/C++ code) provides some excellent hints regarding optimal scope of variables.

In response to comment on allocation:
The above rule is true in C, but might not be for some C++ classes.

For standard types and structures, the size of variable is known at compilation time. There is no such thing as "construction" in C, so the space for the variable will simply be allocated into the stack (without any initialization), when the function is called. That's why there is a "zero" cost when declaring the variable inside a loop.

However, for C++ classes, there is this constructor thing which I know much less about. I guess allocation is probably not going to be the issue, since the compiler shall be clever enough to reuse the same space, but the initialization is likely to take place at each loop iteration.

declare variable inside or outside a loop, does it make big difference?

Yes. scope of variable i is different in both cases.

In first case, A variable i declared in a block or function. So, you can access it in the block or function.

In the second case, A variable I declared in a while loop. So, you can access it in while loop only.

Does it make big difference in terms of performance?

No, it will not matter performance-wise where you declare it.

For example 1:

int main()
{
int i, bigNumber;

while(bigNumber--) {
i = 0;
}
}

Assembly:

main:
push rbp
mov rbp, rsp
.L3:
mov eax, DWORD PTR [rbp-4]
lea edx, [rax-1]
mov DWORD PTR [rbp-4], edx
test eax, eax
setne al
test al, al
je .L2
mov DWORD PTR [rbp-8], 0
jmp .L3
.L2:
mov eax, 0
pop rbp
ret

Example 2:

int main()
{
int bigNumber;

while(bigNumber--) {
int i;
i = 0;
}
}

Assembly:

main:
push rbp
mov rbp, rsp
.L3:
mov eax, DWORD PTR [rbp-4]
lea edx, [rax-1]
mov DWORD PTR [rbp-4], edx
test eax, eax
setne al
test al, al
je .L2
mov DWORD PTR [rbp-8], 0
jmp .L3
.L2:
mov eax, 0
pop rbp
ret

Both generate the same assembly code.

Declaring variables inside or outside of a loop

The scope of local variables should always be the smallest possible.

In your example I presume str is not used outside of the while loop, otherwise you would not be asking the question, because declaring it inside the while loop would not be an option, since it would not compile.

So, since str is not used outside the loop, the smallest possible scope for str is within the while loop.

So, the answer is emphatically that str absolutely ought to be declared within the while loop. No ifs, no ands, no buts.

The only case where this rule might be violated is if for some reason it is of vital importance that every clock cycle must be squeezed out of the code, in which case you might want to consider instantiating something in an outer scope and reusing it instead of re-instantiating it on every iteration of an inner scope. However, this does not apply to your example, due to the immutability of strings in java: a new instance of str will always be created in the beginning of your loop and it will have to be thrown away at the end of it, so there is no possibility to optimize there.

EDIT: (injecting my comment below in the answer)

In any case, the right way to do things is to write all your code properly, establish a performance requirement for your product, measure your final product against this requirement, and if it does not satisfy it, then go optimize things. And what usually ends up happening is that you find ways to provide some nice and formal algorithmic optimizations in just a couple of places which make our program meet its performance requirements instead of having to go all over your entire code base and tweak and hack things in order to squeeze clock cycles here and there.

ES6 declaring variables before or in loop

In code snippet A, cols is accessible outside of the for too. As let variables are block-scoped, when used let to define variable inside for, the scope of the variable is for that block only. So, in B, the variable cols will not be accessible outside of the for.

C, is similar to A if cols is defined only once. If col is defined twice in the same scope using let will result in error.

Which one to use depends on the use-case.

  1. If cols is needed inside for only, then use let cols = ...
  2. If cols is needed outside of for too, use let cols; before for and then it can be used after for too in the same enclosing scope. Note that, in this case, cols will be the last value assigned in the loop.

Difference of declaring variable inside for loop and outside for loop

If the sum_abundant_factor = 0 is declared inside the loop, every time it resets the sum_abundant_factor to 0 when it is the next element of the range.

Here is an easier example:

a = [1, 2, 3]
for i in a:
b = 0
b += i
print(b)

Every time it is the next element in the a list, it will reset b to 0, so then 0 added to i would change nothing to the iterator i, so the above code would output:

1
2
3

Whereas if you define the b variable outside the loop:

a = [1, 2, 3]
b = 0
for i in a:
b += i
print(b)

It will output:

1
3
6

Because it won't reset, it will keep adding, so 0 + 1 is 1 (the first printed value), and 1 + 2 is 3 (the second printed value) and 3 + 3 is 6 (the third printed value).

Is it better to declare a variable inside or outside a loop?

Performance-wise, let's try concrete examples:

public void Method1()
{
foreach(int i in Enumerable.Range(0, 10))
{
int x = i * i;
StringBuilder sb = new StringBuilder();
sb.Append(x);
Console.WriteLine(sb);
}
}
public void Method2()
{
int x;
StringBuilder sb;
foreach(int i in Enumerable.Range(0, 10))
{
x = i * i;
sb = new StringBuilder();
sb.Append(x);
Console.WriteLine(sb);
}
}

I deliberately picked both a value-type and a reference-type in case that affects things. Now, the IL for them:

.method public hidebysig instance void Method1() cil managed
{
.maxstack 2
.locals init (
[0] int32 i,
[1] int32 x,
[2] class [mscorlib]System.Text.StringBuilder sb,
[3] class [mscorlib]System.Collections.Generic.IEnumerator`1<int32> enumerator)
L_0000: ldc.i4.0
L_0001: ldc.i4.s 10
L_0003: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [System.Core]System.Linq.Enumerable::Range(int32, int32)
L_0008: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0> [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator()
L_000d: stloc.3
L_000e: br.s L_002f
L_0010: ldloc.3
L_0011: callvirt instance !0 [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()
L_0016: stloc.0
L_0017: ldloc.0
L_0018: ldloc.0
L_0019: mul
L_001a: stloc.1
L_001b: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
L_0020: stloc.2
L_0021: ldloc.2
L_0022: ldloc.1
L_0023: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(int32)
L_0028: pop
L_0029: ldloc.2
L_002a: call void [mscorlib]System.Console::WriteLine(object)
L_002f: ldloc.3
L_0030: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
L_0035: brtrue.s L_0010
L_0037: leave.s L_0043
L_0039: ldloc.3
L_003a: brfalse.s L_0042
L_003c: ldloc.3
L_003d: callvirt instance void [mscorlib]System.IDisposable::Dispose()
L_0042: endfinally
L_0043: ret
.try L_000e to L_0039 finally handler L_0039 to L_0043
}

.method public hidebysig instance void Method2() cil managed
{
.maxstack 2
.locals init (
[0] int32 x,
[1] class [mscorlib]System.Text.StringBuilder sb,
[2] int32 i,
[3] class [mscorlib]System.Collections.Generic.IEnumerator`1<int32> enumerator)
L_0000: ldc.i4.0
L_0001: ldc.i4.s 10
L_0003: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [System.Core]System.Linq.Enumerable::Range(int32, int32)
L_0008: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0> [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator()
L_000d: stloc.3
L_000e: br.s L_002f
L_0010: ldloc.3
L_0011: callvirt instance !0 [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()
L_0016: stloc.2
L_0017: ldloc.2
L_0018: ldloc.2
L_0019: mul
L_001a: stloc.0
L_001b: newobj instance void [mscorlib]System.Text.StringBuilder::.ctor()
L_0020: stloc.1
L_0021: ldloc.1
L_0022: ldloc.0
L_0023: callvirt instance class [mscorlib]System.Text.StringBuilder [mscorlib]System.Text.StringBuilder::Append(int32)
L_0028: pop
L_0029: ldloc.1
L_002a: call void [mscorlib]System.Console::WriteLine(object)
L_002f: ldloc.3
L_0030: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
L_0035: brtrue.s L_0010
L_0037: leave.s L_0043
L_0039: ldloc.3
L_003a: brfalse.s L_0042
L_003c: ldloc.3
L_003d: callvirt instance void [mscorlib]System.IDisposable::Dispose()
L_0042: endfinally
L_0043: ret
.try L_000e to L_0039 finally handler L_0039 to L_0043
}

As you can see, apart from the order on the stack the compiler happened to choose - which could just as well have been a different order - it had absolutely no effect. In turn, there really isn't anything that one is giving the jitter to make much use of that the other isn't giving it.

Other than that, there is one sort-of difference.

In my Method1(), x and sb are scoped to the foreach, and cannot be accessed either deliberately or accidentally outside of it.

In my Method2(), x and sb are not known at compile-time to be reliably assigned a value within the foreach (the compiler doesn't know the foreach will perform at least one loop), so use of it is forbidden.

So far, no real difference.

I can however assign and use x and/or sb outside of the foreach. As a rule I would say that this is probably poor scoping most of the time, so I'd favour Method1, but I might have some sensible reason to want to refer to them (more realistically if they weren't possibly unassigned), in which case I'd go for Method2.

Still, that's a matter of how the each code can be extended or not, not a difference of the code as written. Really, there's no difference.

Declaring variables inside or outside in a for-in loop

Those two snippets of code do exactly the same thing (and that's the case in most language such as C, C++ and C# amongst others). If the variable was redeclared at every iteration, then following your logic, it would also be re-initialized, and would constantly loop over the same object. Your loop would be infinite.

On a side-note, in JavaScript, all variable declarations get pushed to the function scope; this means that you can declare variables anywhere within a function, even within nested loops, and they will only be declared once.

Link to the var documentation

Relevant SO question

Other relevant SO answer

Edit courtesy of @torazaburo:

If you want to declare a variable with a local scope (as in, a variable that will only be defined in the current block such as a for, while or if, you can use the let statement:

let var1 = 123;

It also allows you to override variables with the same name but declared in a higher scope, such as in this example from the docs:

function letTest() {
let x = 1;
if (true) {
let x = 2; // different variable
console.log(x); // 2
}
console.log(x); // 1
}

See the full documentation (and examples) here.



Related Topics



Leave a reply



Submit