In Arrayblockingqueue, Why Copy Final Member Field into Local Final Variable

In ArrayBlockingQueue, why copy final member field into local final variable?

It's an extreme optimization Doug Lea, the author of the class, likes to use. Here's a post on a recent thread on the core-libs-dev mailing list about this exact subject which answers your question pretty well.

from the post:

...copying to locals produces the smallest
bytecode, and for low-level code it's nice to write code
that's a little closer to the machine

Is it a good practice to define final local variable when I want to use class field in class method in Java?

This is a micro-optimisation which probably makes sense for standard library code which is going to be used by lots and lots of people and read by significantly fewer people. I wouldn't recommend doing this in your own code unless performance is a real concern and you've profiled your application to determine which parts of the code are the real bottlenecks, since otherwise the performance gain is likely to be negligible, so the main effect will be to make your code more verbose.

Using local variables should have some very minor performance benefits, because:

  • Local variables will be allocated either on the stack or in registers, which are faster to access than other memory, so doing only a single field access from the heap definitely minimises the number of slower accesses (at best, accessing memory on the heap can only be as fast as accessing a local variable, but at worst it can be slower).
  • The bytecode instructions for accessing local variables are shorter. The first four local variables can be loaded using aload0, aload1, aload2 and aload3, which are a single byte each, and aload is two bytes including an index to the local variable to be loaded. In comparison, getfield takes three bytes, two of which are used to reference the field being accessed.

There is also potentially a semantic difference in non-synchronised methods, since the local variable will hold the same value even if the field is changed by another thread during the method execution. However, most of the standard library classes you'll see this pattern in are not meant to be thread-safe, and even if the class is meant for concurrency, doing this doesn't necessarily make the method thread-safe anyway.

Why copy a field reference to a local before using it in a loop?

I took a look at the bytecode, as @user commented, it's probably an optimization to avoid the getfield call within the loop. But they messed up and still reference the the value variable in the loop condition... so this actually makes the bytecode longer and slower.

public int h1() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}

public int h2() {
int h = hash;
if (h == 0 && value.length > 0) {
for (int i = 0; i < value.length; i++) {
h = 31 * h + value[i];
}
hash = h;
}
return h;
}

We can see that both methods produce almost identical bytecode, however our "optimized" implementation actually ends up using 2 more calls.

Notice how the for loop test (lines 39-45 in h1, and lines 37-43 in h2) call getfield to do the array length call.

Bytecode:

public int h1();
Code:
0: aload_0
1: getfield #17 // Field hash:I
4: istore_1
5: iload_1
6: ifne 53
9: aload_0
10: getfield #15 // Field value:[C
13: arraylength
14: ifle 53
17: aload_0
18: getfield #15 // Field value:[C
21: astore_2
22: iconst_0
23: istore_3
24: goto 39
27: bipush 31
29: iload_1
30: imul
31: aload_2
32: iload_3
33: caload
34: iadd
35: istore_1
36: iinc 3, 1
39: iload_3
40: aload_0
41: getfield #15 // Field value:[C
44: arraylength
45: if_icmplt 27
48: aload_0
49: iload_1
50: putfield #17 // Field hash:I
53: iload_1
54: ireturn

public int h2();
Code:
0: aload_0
1: getfield #17 // Field hash:I
4: istore_1
5: iload_1
6: ifne 51
9: aload_0
10: getfield #15 // Field value:[C
13: arraylength
14: ifle 51
17: iconst_0
18: istore_2
19: goto 37
22: bipush 31
24: iload_1
25: imul
26: aload_0
27: getfield #15 // Field value:[C
30: iload_2
31: caload
32: iadd
33: istore_1
34: iinc 2, 1
37: iload_2
38: aload_0
39: getfield #15 // Field value:[C
42: arraylength
43: if_icmplt 22
46: aload_0
47: iload_1
48: putfield #17 // Field hash:I
51: iload_1
52: ireturn

If they had changed the loop condition to also use the local field,

...
for (int i = 0; i < val.length; i++) {
...

Then the bytecode actually becomes shorter and loses the arguably slower getfield call in the loop,

 public int h1();
Code:
0: aload_0
1: getfield #17 // Field hash:I
4: istore_1
5: iload_1
6: ifne 50
9: aload_0
10: getfield #15 // Field value:[C
13: arraylength
14: ifle 50
17: aload_0
18: getfield #15 // Field value:[C
21: astore_2
22: iconst_0
23: istore_3
24: goto 39
27: bipush 31
29: iload_1
30: imul
31: aload_2
32: iload_3
33: caload
34: iadd
35: istore_1
36: iinc 3, 1
39: iload_3
40: aload_2
41: arraylength
42: if_icmplt 27
45: aload_0
46: iload_1
47: putfield #17 // Field hash:I
50: iload_1
51: ireturn

Doing a dumb test of just looping over the method a few million times on my jdk 1.7.0_79 consistently shows the original method hilariously takes about 5 times longer to run then either the un-"optimized" or the correctly "optimized" methods. The latter 2 having no difference in performance.

So I guess in conclusion, storing the field locally was an attempt at optimizing the methods bytecode, probably before the jit was able to optimize this itself, but they messed it up and actually made the method a lot worse...

This code is actually fixed in Java 9 per,
https://bugs.openjdk.java.net/browse/JDK-8058643

Why j.u.c.CopyOnWriteArrayList creates local lock variable inside methods

That is just a micro optimization technique. In theory, access to a local variable is faster then access to field and it may also result to a smaller bytecode. Though HotSpot compiler may actually optimize field access to a register call, so it won't be different.

Is it important to copy a reference to a local variable before using it

My first thought was that it helps concurrency in some way...

...so I guess it is some hint for optimizer...

Both. :-) The fact that LinkedList doesn't support concurrency doesn't mean that the authors aren't going to follow good practices anyway, and it tells the compiler and the JIT that they should only look up first once.

Without the f local variable, we'd have:

public E peek() {
return (this.first == null) ? null : this.first.item;
}

I've added the implied this. to emphasize that first is an instance field.

So if the this.first == null part is evaluated on Thread A, then this.first is changed on Thread B, when this.first.item is evaluated on Thread A, it may throw because this.first has become null in the meantime. That's impossible with f, because f is local; only the thread running the peek call will see it.

The final part is both good in-code documentation (as the author never intends to change the value of f) and a hint to the optimizer that we're never going to change f, which means that when it comes time to optimize, it can optimize it to within an inch of its life, knowing that it only ever has to read this.first once and can then use a register or stack value for both the null check and the return.

Why it doesn't use the instance field directly, but assigns it to a local variable?

Could it be for optimization purposes?

Possibly a local variable could more easily be directly allocated to a register with a JIT compiler.

At least in Android, for the first versions of the API, accessing a local variable was cheaper than accessing an instance variable (can't speak for newer versions). It could be that plain Java is the same, and in some cases it makes sense to use a local.

Actually, found a thread confirming this here. Extract:

It's a coding style made popular by Doug Lea. It's an extreme
optimization that probably isn't necessary; you can expect the JIT to
make the same optimizations. (you can try to check the machine code
yourself!) Nevertheless, copying to locals produces the smallest
bytecode, and for low-level code it's nice to write code that's a
little closer to the machine.

Why there is a local variable referencing value array in String#hashCode?

This is performance optimization according to Martin Buchholz

copying to locals produces the smallest
bytecode

See also:

In ArrayBlockingQueue, why copy final member field into local final variable?


+----------------------+----------------------------+------------------------------+
| h = 31 * h + val[i]; | h = 31 * h + value[i]; | hash = 31 * hash + value[i]; |
+----------------------+----------------------------+------------------------------+
| LINENUMBER 1471 L7 | LINENUMBER 14 L6 | LINENUMBER 13 L4 |
| BIPUSH 31 | BIPUSH 31 | ALOAD 0 |
| ILOAD 1 | ILOAD 1 | BIPUSH 31 |
| IMUL | IMUL | ALOAD 0 |
| ALOAD 2 | ALOAD 0 | GETFIELD String.hash : I |
| ILOAD 3 | GETFIELD String.value : [C | IMUL |
| CALOAD | ILOAD 2 | ALOAD 0 |
| IADD | CALOAD | GETFIELD String.value : [C |
| ISTORE 1 | IADD | ILOAD 1 |
| | ISTORE 1 | CALOAD |
| | | IADD |
| | | PUTFIELD String.hash : I |
+----------------------+----------------------------+------------------------------+

Why there is a local variable referencing value array in String#hashCode?

This is performance optimization according to Martin Buchholz

copying to locals produces the smallest
bytecode

See also:

In ArrayBlockingQueue, why copy final member field into local final variable?


+----------------------+----------------------------+------------------------------+
| h = 31 * h + val[i]; | h = 31 * h + value[i]; | hash = 31 * hash + value[i]; |
+----------------------+----------------------------+------------------------------+
| LINENUMBER 1471 L7 | LINENUMBER 14 L6 | LINENUMBER 13 L4 |
| BIPUSH 31 | BIPUSH 31 | ALOAD 0 |
| ILOAD 1 | ILOAD 1 | BIPUSH 31 |
| IMUL | IMUL | ALOAD 0 |
| ALOAD 2 | ALOAD 0 | GETFIELD String.hash : I |
| ILOAD 3 | GETFIELD String.value : [C | IMUL |
| CALOAD | ILOAD 2 | ALOAD 0 |
| IADD | CALOAD | GETFIELD String.value : [C |
| ISTORE 1 | IADD | ILOAD 1 |
| | ISTORE 1 | CALOAD |
| | | IADD |
| | | PUTFIELD String.hash : I |
+----------------------+----------------------------+------------------------------+

In ArrayBlockingQueue, why copy final member field into local final variable?

It's an extreme optimization Doug Lea, the author of the class, likes to use. Here's a post on a recent thread on the core-libs-dev mailing list about this exact subject which answers your question pretty well.

from the post:

...copying to locals produces the smallest
bytecode, and for low-level code it's nice to write code
that's a little closer to the machine



Related Topics



Leave a reply



Submit