Which Loop Has Better Performance? Why

Which loop is faster, while or for?

That clearly depends on the particular implementation of the interpreter/compiler of the specific language.

That said, theoretically, any sane implementation is likely to be able to implement one in terms of the other if it was faster so the difference should be negligible at most.

Of course, I assumed while and for behave as they do in C and similar languages. You could create a language with completely different semantics for while and for

Which loop has better performance? Why?

Limited Scope is Best

Use your second option:

for ( ... ) {
String s = ...;
}

Scope Doesn't Affect Performance

If you disassemble code the compiled from each (with the JDK's javap tool), you will see that the loop compiles to the exact same JVM instructions in both cases. Note also that Brian R. Bondy's "Option #3" is identical to Option #1. Nothing extra is added or removed from the stack when using the tighter scope, and same data are used on the stack in both cases.

Avoid Premature Initialization

The only difference between the two cases is that, in the first example, the variable s is unnecessarily initialized. This is a separate issue from the location of the variable declaration. This adds two wasted instructions (to load a string constant and store it in a stack frame slot). A good static analysis tool will warn you that you are never reading the value you assign to s, and a good JIT compiler will probably elide it at runtime.

You could fix this simply by using an empty declaration (i.e., String s;), but this is considered bad practice and has another side-effect discussed below.

Often a bogus value like null is assigned to a variable simply to hush a compiler error that a variable is read without being initialized. This error can be taken as a hint that the variable scope is too large, and that it is being declared before it is needed to receive a valid value. Empty declarations force you to consider every code path; don't ignore this valuable warning by assigning a bogus value.

Conserve Stack Slots

As mentioned, while the JVM instructions are the same in both cases, there is a subtle side-effect that makes it best, at a JVM level, to use the most limited scope possible. This is visible in the "local variable table" for the method. Consider what happens if you have multiple loops, with the variables declared in unnecessarily large scope:

void x(String[] strings, Integer[] integers) {
String s;
for (int i = 0; i < strings.length; ++i) {
s = strings[0];
...
}
Integer n;
for (int i = 0; i < integers.length; ++i) {
n = integers[i];
...
}
}

The variables s and n could be declared inside their respective loops, but since they are not, the compiler uses two "slots" in the stack frame. If they were declared inside the loop, the compiler can reuse the same slot, making the stack frame smaller.

What Really Matters

However, most of these issues are immaterial. A good JIT compiler will see that it is not possible to read the initial value you are wastefully assigning, and optimize the assignment away. Saving a slot here or there isn't going to make or break your application.

The important thing is to make your code readable and easy to maintain, and in that respect, using a limited scope is clearly better. The smaller scope a variable has, the easier it is to comprehend how it is used and what impact any changes to the code will have.

In .NET, which loop runs faster, 'for' or 'foreach'?

Patrick Smacchia blogged about this last month, with the following conclusions:

  • for loops on List are a bit more than 2 times cheaper than foreach
    loops on List.
  • Looping on array is around 2 times cheaper than looping on List.
  • As a consequence, looping on array using for is 5 times cheaper
    than looping on List using foreach
    (which I believe, is what we all do).

Which loop has better performance? Increment or decrement?

The second "may" be better, because it's easier to compare i with 0 than to compare i with 10 but I think you can use any one of these, because compiler will optimize them.

Javascript Performance: While vs For Loops

You should have countered that a negative while loop would be even faster! See: JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing.

In while versus for, these two sources document the speed phenomenon pretty well by running various loops in different browsers and comparing the results in milliseconds:
https://blogs.oracle.com/greimer/entry/best_way_to_code_a and:
http://www.stoimen.com/blog/2012/01/24/javascript-performance-for-vs-while/.

Conceptually, a for loop is basically a packaged while loop that is specifically geared towards incrementing or decrementing (progressing over the logic according to some order or some length). For example,

for (let k = 0; k < 20; ++k) {…}

can be sped up by making it a negative while loop:

var k = 20;
while (--k) {…}

and as you can see from the measurements in the links above, the time saved really does add up for very large numbers.

for loop vs while loop vs foreach loop PHP

which one is better for performance?

It doesn't matter.

what's the criteria to select a loop?

If you just need to walk through all the elements of an object or array, use foreach. Cases where you need for include

  • When you explicitly need to do things with the numeric index, for example:
  • when you need to use previous or next elements from within an iteration
  • when you need to change the counter during an iteration

foreach is much more convenient because it doesn't require you to set up the counting, and can work its way through any kind of member - be it object properties or associative array elements (which a for won't catch). It's usually best for readability.

which should be used when we loop inside another loop?

Both are fine; in your demo case, foreach is the simplest way to go.

looping over array, performance difference between indexed and enhanced for loop

TL;DR: You are observing what happens when JIT compiler cannot trust that values are not changing inside the loop. Additionally, in the tiny benchmark like this, Blackhole.consume costs dominate, obscuring the results.

Simplifying the test:

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class JdkBenchmarks {

public int[] values;

@Setup
public void setupArray() {
int count = 1000;
values = new int[count];
for(int i = 0; i < count; i++) {
values[i] = i;
}
}

@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void indexed(Blackhole bh) {
for(int i = 0; i < values.length; i++) {
bh.consume(values[i]);
}
}

@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void indexed_cached(Blackhole bh) {
int[] vs = values;
int length = vs.length;
for(int i = 0; i < length; i++) {
bh.consume(vs[i]);
}
}

@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void enhanced(Blackhole bh) {
for (int value : values) {
bh.consume(value);
}
}
}

Running both enhanced and indexed_cached under -prof perfasm reveals this hot loop (I specifically did @CompilerControl(DONT_INLINE) to let @Benchmark method be compiled alone, which makes an easier to digest perfasm output):

         ↗  0x...4240: mov  0x10(%r8,%rsi,4),%r10d  ; load values[i], blackhole it
22.68% │ 0x...4245: mov 0x14(%r8,%rsi,4),%r11d ; ... repeat 7 more times...
│ 0x...424a: mov 0x18(%r8,%rsi,4),%r10d ;
20.95% │ 0x...424f: mov 0x1c(%r8,%rsi,4),%r10d ;
0.02% │ 0x...4254: mov 0x20(%r8,%rsi,4),%r11d ;
24.73% │ 0x...4259: mov 0x24(%r8,%rsi,4),%r10d ;
0.24% │ 0x...425e: mov 0x28(%r8,%rsi,4),%r11d ;
20.04% │ 0x...4263: mov 0x2c(%r8,%rsi,4),%r10d ;
0.22% │ 0x...4268: add $0x8,%esi ; i += 8
│ 0x...426b: cmp %ebp,%esi ; compare i with length (in %ebp)
0.26% ╰ 0x...426d: jl 0x...4240 ; circle back if 8 more elements available

Very efficient!

Running indexed with -prof perfasm reveals:

         ↗  0x...4170: mov  0xc(%r12,%r8,8),%r9d    ; array bounds check, load values.length
3.42% │ 0x...4175: cmp %r9d,%r10d ; array bounds check, compare i
16.02% │ 0x...4178: jae 0x...41b1 ; failed? jump to exception handling
│ 0x...417a: lea (%r12,%r8,8),%r11 ; load values[i], part 1
0.04% │ 0x...417e: mov 0x10(%r11,%r10,4),%r11d ; load values[i], part 2
│ ; %r11d is blackholed
35.69% │ 0x...4183: mov 0xc(%rsi),%r8d ; get "values"
0.71% │ 0x...4187: mov 0x348(%r15),%r11 ; safepoint poll, part 1 (JVM infra)
4.03% │ 0x...418e: inc %r10d ; i++
0.12% │ 0x...4191: test %eax,(%r11) ; safepoint poll, part 2 (JVM infra)
27.74% │ 0x...4194: mov 0xc(%r12,%r8,8),%r9d ; load values.length
8.53% │ 0x...4199: cmp %r9d,%r10d ; check i < values.length
0.24% ╰ 0x...419c: jl 0x...4170 ; circle back if more

This is because Blackhole.consume call is opaque to the compiler (like many other non-inlined calls), so it has to conservatively presume that values can change in the middle of the loop!

Which means, compiler cannot stash values in a register, it cannot trust the array bounds check to always succeed, it cannot even guarantee the loop terminates (hence safepoint polls), and on top of that, loop unrolling does not want to multiply that per-element mess even more.

So you get the penalty like this (TR 3970X, JDK 17.0.2 EA, Linux x86_64):

Benchmark                     Mode  Cnt     Score   Error  Units
JdkBenchmarks.enhanced avgt 5 144.962 ± 0.918 ns/op
JdkBenchmarks.indexed avgt 5 1030.981 ± 3.775 ns/op ; + 880 ns/op!
JdkBenchmarks.indexed_cached avgt 5 144.799 ± 0.643 ns/op ; same as enhanced

Additional fun part:

On most JDKs the dominating costs are the costs of calling the Blackhole.consume in this test. Compared to the cost of array access, the cost of Java-style Blackhole is quite bad. With JDK 17+ and JMH 1.34, the compiler Blackholes would be used, and thus provide much more fidelity for the test.

Without compiler blackholes, the effect hides in the Blackhole overhead nearly completely (>25x overhead means we can execute a lot of bad code preceding the Blackhole call!):

Benchmark                     Mode  Cnt     Score   Error  Units
JdkBenchmarks.enhanced avgt 5 4062.866 ± 4.736 ns/op
JdkBenchmarks.indexed avgt 5 4071.620 ± 1.057 ns/op ; + 10 ns/op [whoops]
JdkBenchmarks.indexed_cached avgt 5 4061.390 ± 0.692 ns/op ; same as enhanced

It would re-manifest if we drop @CompilerControl(DONT_INLINE), because the resulting generated code would be much messier:

Benchmark                     Mode  Cnt     Score    Error  Units
JdkBenchmarks.enhanced avgt 5 4067.118 ± 40.699 ns/op
JdkBenchmarks.indexed avgt 5 4601.370 ± 0.632 ns/op ; + 530 ns/op
JdkBenchmarks.indexed_cached avgt 5 4064.455 ± 1.554 ns/op ; same as enhanced

Performance difference between for, while and do while loop?

The overall semantics of all three iteration statements is the same once they have been compiled into binary code.

They just provide a different taste over the same thing. Performance depends on:

  • the complexity of the body of the loop
  • the complexity of calculating the end condition / next iteration step

Since they all provide both of them there is no performance difference per se in any of them. Unless you consider small irrelevant things that you shouldn't care in any case.

There could be some optimization tricks that can be done according to the kind of loop but you shouldn't rely on them, even because they could be compiler dependent, so meaningless from your point of view.



Related Topics



Leave a reply



Submit