Fastest Way to Iterate an Array in Java: Loop Variable VS Enhanced for Statement

Fastest way to iterate an Array in Java: loop variable vs enhanced for statement

If you're looping through an array, it shouldn't matter - the enhanced for loop uses array accesses anyway.

For example, consider this code:

public static void main(String[] args)
{
for (String x : args)
{
System.out.println(x);
}
}

When decompiled with javap -c Test we get (for the main method):

public static void main(java.lang.String[]);
Code:
0: aload_0
1: astore_1
2: aload_1
3: arraylength
4: istore_2
5: iconst_0
6: istore_3
7: iload_3
8: iload_2
9: if_icmpge 31
12: aload_1
13: iload_3
14: aaload
15: astore 4
17: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
20: aload 4
22: invokevirtual #3; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
25: iinc 3, 1
28: goto 7
31: return

Now change it to use an explicit array access:

public static void main(String[] args)
{
for (int i = 0; i < args.length; i++)
{
System.out.println(args[i]);
}
}

This decompiles to:

public static void main(java.lang.String[]);
Code:
0: iconst_0
1: istore_1
2: iload_1
3: aload_0
4: arraylength
5: if_icmpge 23
8: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
11: aload_0
12: iload_1
13: aaload
14: invokevirtual #3; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
17: iinc 1, 1
20: goto 2
23: return

There's a bit more setup code in the enhanced for loop, but they're basically doing the same thing. No iterators are involved. Furthermore, I'd expect them to get JITted to even more similar code.

Suggestion: if you really think it might make a significant difference (which it would only ever do if the body of the loop is absolutely miniscule) then you should benchmark it with your real application. That's the only situation which matters.

looping over array, performance difference between indexed and enhanced for loop

TL;DR: You are observing what happens when JIT compiler cannot trust that values are not changing inside the loop. Additionally, in the tiny benchmark like this, Blackhole.consume costs dominate, obscuring the results.

Simplifying the test:

@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
public class JdkBenchmarks {

public int[] values;

@Setup
public void setupArray() {
int count = 1000;
values = new int[count];
for(int i = 0; i < count; i++) {
values[i] = i;
}
}

@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void indexed(Blackhole bh) {
for(int i = 0; i < values.length; i++) {
bh.consume(values[i]);
}
}

@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void indexed_cached(Blackhole bh) {
int[] vs = values;
int length = vs.length;
for(int i = 0; i < length; i++) {
bh.consume(vs[i]);
}
}

@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void enhanced(Blackhole bh) {
for (int value : values) {
bh.consume(value);
}
}
}

Running both enhanced and indexed_cached under -prof perfasm reveals this hot loop (I specifically did @CompilerControl(DONT_INLINE) to let @Benchmark method be compiled alone, which makes an easier to digest perfasm output):

         ↗  0x...4240: mov  0x10(%r8,%rsi,4),%r10d  ; load values[i], blackhole it
22.68% │ 0x...4245: mov 0x14(%r8,%rsi,4),%r11d ; ... repeat 7 more times...
│ 0x...424a: mov 0x18(%r8,%rsi,4),%r10d ;
20.95% │ 0x...424f: mov 0x1c(%r8,%rsi,4),%r10d ;
0.02% │ 0x...4254: mov 0x20(%r8,%rsi,4),%r11d ;
24.73% │ 0x...4259: mov 0x24(%r8,%rsi,4),%r10d ;
0.24% │ 0x...425e: mov 0x28(%r8,%rsi,4),%r11d ;
20.04% │ 0x...4263: mov 0x2c(%r8,%rsi,4),%r10d ;
0.22% │ 0x...4268: add $0x8,%esi ; i += 8
│ 0x...426b: cmp %ebp,%esi ; compare i with length (in %ebp)
0.26% ╰ 0x...426d: jl 0x...4240 ; circle back if 8 more elements available

Very efficient!

Running indexed with -prof perfasm reveals:

         ↗  0x...4170: mov  0xc(%r12,%r8,8),%r9d    ; array bounds check, load values.length
3.42% │ 0x...4175: cmp %r9d,%r10d ; array bounds check, compare i
16.02% │ 0x...4178: jae 0x...41b1 ; failed? jump to exception handling
│ 0x...417a: lea (%r12,%r8,8),%r11 ; load values[i], part 1
0.04% │ 0x...417e: mov 0x10(%r11,%r10,4),%r11d ; load values[i], part 2
│ ; %r11d is blackholed
35.69% │ 0x...4183: mov 0xc(%rsi),%r8d ; get "values"
0.71% │ 0x...4187: mov 0x348(%r15),%r11 ; safepoint poll, part 1 (JVM infra)
4.03% │ 0x...418e: inc %r10d ; i++
0.12% │ 0x...4191: test %eax,(%r11) ; safepoint poll, part 2 (JVM infra)
27.74% │ 0x...4194: mov 0xc(%r12,%r8,8),%r9d ; load values.length
8.53% │ 0x...4199: cmp %r9d,%r10d ; check i < values.length
0.24% ╰ 0x...419c: jl 0x...4170 ; circle back if more

This is because Blackhole.consume call is opaque to the compiler (like many other non-inlined calls), so it has to conservatively presume that values can change in the middle of the loop!

Which means, compiler cannot stash values in a register, it cannot trust the array bounds check to always succeed, it cannot even guarantee the loop terminates (hence safepoint polls), and on top of that, loop unrolling does not want to multiply that per-element mess even more.

So you get the penalty like this (TR 3970X, JDK 17.0.2 EA, Linux x86_64):

Benchmark                     Mode  Cnt     Score   Error  Units
JdkBenchmarks.enhanced avgt 5 144.962 ± 0.918 ns/op
JdkBenchmarks.indexed avgt 5 1030.981 ± 3.775 ns/op ; + 880 ns/op!
JdkBenchmarks.indexed_cached avgt 5 144.799 ± 0.643 ns/op ; same as enhanced

Additional fun part:

On most JDKs the dominating costs are the costs of calling the Blackhole.consume in this test. Compared to the cost of array access, the cost of Java-style Blackhole is quite bad. With JDK 17+ and JMH 1.34, the compiler Blackholes would be used, and thus provide much more fidelity for the test.

Without compiler blackholes, the effect hides in the Blackhole overhead nearly completely (>25x overhead means we can execute a lot of bad code preceding the Blackhole call!):

Benchmark                     Mode  Cnt     Score   Error  Units
JdkBenchmarks.enhanced avgt 5 4062.866 ± 4.736 ns/op
JdkBenchmarks.indexed avgt 5 4071.620 ± 1.057 ns/op ; + 10 ns/op [whoops]
JdkBenchmarks.indexed_cached avgt 5 4061.390 ± 0.692 ns/op ; same as enhanced

It would re-manifest if we drop @CompilerControl(DONT_INLINE), because the resulting generated code would be much messier:

Benchmark                     Mode  Cnt     Score    Error  Units
JdkBenchmarks.enhanced avgt 5 4067.118 ± 40.699 ns/op
JdkBenchmarks.indexed avgt 5 4601.370 ± 0.632 ns/op ; + 530 ns/op
JdkBenchmarks.indexed_cached avgt 5 4064.455 ± 1.554 ns/op ; same as enhanced

Is there a performance difference between a for loop and a for-each loop?

From Item 46 in Effective Java by Joshua Bloch :

The for-each loop, introduced in
release 1.5, gets rid of the clutter
and the opportunity for error by
hiding the iterator or index variable
completely. The resulting idiom
applies equally to collections and
arrays:

// The preferred idiom for iterating over collections and arrays
for (Element e : elements) {
doSomething(e);
}

When you see the colon (:), read it as
“in.” Thus, the loop above reads as
“for each element e in elements.” Note
that there is no performance penalty
for using the for-each loop, even for
arrays. In fact, it may offer a slight
performance advantage over an ordinary
for loop in some circumstances, as it
computes the limit of the array index
only once. While you can do this by
hand (Item 45), programmers don’t
always do so.

why is the enhanced for loop more efficient than the normal for loop

It's a bit of an oversimplification to say that the enhanced for loop is more efficient. It can be, but in many cases it's almost exactly the same as an old-school loop.

The first thing to note is that for collections the enhanced for loop uses an Iterator, so if you manually iterate over a collection using an Iterator then you should have pretty much the same performance than the enhanced for loop.

One place where the enhanced for loop is faster than a naively implemented traditional loop is something like this:

LinkedList<Object> list = ...;

// Loop 1:
int size = list.size();
for (int i = 0; i<size; i++) {
Object o = list.get(i);
/// do stuff
}

// Loop 2:
for (Object o : list) {
// do stuff
}

// Loop 3:
Iterator<Object> it = list.iterator();
while (it.hasNext()) {
Object o = it.next();
// do stuff
}

In this case Loop 1 will be slower than both Loop 2 and Loop 3, because it will have to (partially) traverse the list in each iteration to find the element at position i. Loop 2 and 3, however will only ever step one element further in the list, due to the use of an Iterator. Loop 2 and 3 will also have pretty much the same performance since Loop 3 is pretty much exactly what the compiler will produce when you write the code in Loop 2.

How does the enhanced for statement work for arrays, and how to get an iterator for an array?

If you want an Iterator over an array, you could use one of the direct implementations out there instead of wrapping the array in a List. For example:

Apache Commons Collections ArrayIterator

Or, this one, if you'd like to use generics:

com.Ostermiller.util.ArrayIterator

Note that if you want to have an Iterator over primitive types, you can't, because a primitive type can't be a generic parameter. E.g., if you want an Iterator<int>, you have to use an Iterator<Integer> instead, which will result in a lot of autoboxing and -unboxing if that's backed by an int[].

What's the fastest way to loop through an array in JavaScript?

After performing this test with most modern browsers:
https://jsben.ch/wY5fo

Currently, the fastest form of loop (and in my opinion the most syntactically obvious).

A standard for-loop with length caching

    var i = 0, len = myArray.length;
while (i < len) {
// your code
i++
}

I would say, this is definitely a case where I applaud JavaScript engine developers. A runtime should be optimized for clarity, not cleverness.

In a java enhanced for loop, is it safe to assume the expression to be looped over will be evaluated only once?

About the enhanced for statement, the Java Language Specifications writes:

The enhanced for statement has the
form:

EnhancedForStatement:
for ( VariableModifiersopt Type Identifier: Expression) Statement

The Expression must either have type
Iterable or else it must be of an
array type (§10.1), or a compile-time
error occurs.

The scope of a local variable declared
in the FormalParameter part of an
enhanced for statement (§14.14) is
the contained Statement

The meaning of the enhanced for
statement is given by translation into
a basic for statement.

If the type of Expression is a
subtype of Iterable, then let I be
the type of the expression
Expression.iterator(). The enhanced for statement is equivalent
to a basic for statement of the
form:

for (I #i = Expression.iterator(); #i.hasNext(); ) {

VariableModifiersopt Type Identifier = #i.next();
Statement
}

Where #i is a compiler-generated
identifier that is distinct from any
other identifiers (compiler-generated
or otherwise) that are in scope (§6.3)
at the point where the enhanced for
statement occurs.

Otherwise, the Expression necessarily
has an array type, T[]. Let L1 ... Lm
be the (possibly empty) sequence of
labels immediately preceding the
enhanced for statement. Then the
meaning of the enhanced for statement
is given by the following basic for
statement:

T[] a = Expression;
L1: L2: ... Lm:
for (int i = 0; i < a.length; i++) {
VariableModifiersopt Type Identifier = a[i];
Statement
}

Where a and i are compiler-generated
identifiers that are distinct from any
other identifiers (compiler-generated
or otherwise) that are in scope at the
point where the enhanced for statement
occurs.

So in your case, genArray() doesn't return a subtype of Iterable but an array type, so your enhanced for statement is equivalent to the following basic for statement:

String[] a = genArray();
...
for (int i = 0; i < a.length; i++) {
String s = a[i];
// ...
}

And genArray() will thus be called only once (but the currently accepted answer is partially wrong).

Enhanced for loop with array

From the Java tutorial on this subject:

The for-each construct is also applicable to arrays, where it hides
the index variable rather than the iterator. The following method
returns the sum of the values in an int array:

// Returns the sum of the elements of a
int sum(int[] a) {
int result = 0;
for (int i : a)
result += i;
return result;
}

And from §14.14.2 of the JLS (the Java Language Specification):

The enhanced for statement has the form:

EnhancedForStatement:
for ( FormalParameter : Expression ) Statement

The type of the Expression must be Iterable or an array type, or a
compile-time error occurs.

But note that arrays don't implement Iterable; from §10.1 of the JLS:

The direct superclass of an array type is Object.

Every array type implements the interfaces Cloneable and java.io.Serializable.

Which is more efficient, a for-each loop, or an iterator?

If you are just wandering over the collection to read all of the values, then there is no difference between using an iterator or the new for loop syntax, as the new syntax just uses the iterator underwater.

If however, you mean by loop the old "c-style" loop:

for(int i=0; i<list.size(); i++) {
Object o = list.get(i);
}

Then the new for loop, or iterator, can be a lot more efficient, depending on the underlying data structure. The reason for this is that for some data structures, get(i) is an O(n) operation, which makes the loop an O(n2) operation. A traditional linked list is an example of such a data structure. All iterators have as a fundamental requirement that next() should be an O(1) operation, making the loop O(n).

To verify that the iterator is used underwater by the new for loop syntax, compare the generated bytecodes from the following two Java snippets. First the for loop:

List<Integer>  a = new ArrayList<Integer>();
for (Integer integer : a)
{
integer.toString();
}
// Byte code
ALOAD 1
INVOKEINTERFACE java/util/List.iterator()Ljava/util/Iterator;
ASTORE 3
GOTO L2
L3
ALOAD 3
INVOKEINTERFACE java/util/Iterator.next()Ljava/lang/Object;
CHECKCAST java/lang/Integer
ASTORE 2
ALOAD 2
INVOKEVIRTUAL java/lang/Integer.toString()Ljava/lang/String;
POP
L2
ALOAD 3
INVOKEINTERFACE java/util/Iterator.hasNext()Z
IFNE L3

And second, the iterator:

List<Integer>  a = new ArrayList<Integer>();
for (Iterator iterator = a.iterator(); iterator.hasNext();)
{
Integer integer = (Integer) iterator.next();
integer.toString();
}
// Bytecode:
ALOAD 1
INVOKEINTERFACE java/util/List.iterator()Ljava/util/Iterator;
ASTORE 2
GOTO L7
L8
ALOAD 2
INVOKEINTERFACE java/util/Iterator.next()Ljava/lang/Object;
CHECKCAST java/lang/Integer
ASTORE 3
ALOAD 3
INVOKEVIRTUAL java/lang/Integer.toString()Ljava/lang/String;
POP
L7
ALOAD 2
INVOKEINTERFACE java/util/Iterator.hasNext()Z
IFNE L8

As you can see, the generated byte code is effectively identical, so there is no performance penalty to using either form. Therefore, you should choose the form of loop that is most aesthetically appealing to you, for most people that will be the for-each loop, as that has less boilerplate code.



Related Topics



Leave a reply



Submit