Java How Expensive Is a Method Call

java how expensive is a method call

Would it be more optimal for me to just copy and paste the two lines in my clear() method into the constructor instead of calling the actual method?

The compiler can perform that optimization. And so can the JVM. The terminology used by compiler writer and JVM authors is "inline expansion".

If so how much of a difference does it make?

Measure it. Often, you'll find that it makes no difference. And if you believe that this is a performance hotspot, you're looking in the wrong place; that's why you'll need to measure it.

What if my constructor made 10 method calls with each one simply setting an instance variable to a value?

Again, that depends on the generated bytecode and any runtime optimizations performed by the Java Virtual machine. If the compiler/JVM can inline the method calls, it will perform the optimization to avoid the overhead of creating new stack frames at runtime.

What's the best programming practice?

Avoiding premature optimization. The best practice is to write readable and well-designed code, and then optimize for the performance hotspots in your application.

Cost of invoking a method on Android

I totally disagree. In theory, method calls do add a bit of overhead. Things have to be pushed onto the stack and then a jump to the method. But the overhead is trivial.

Premature optimization is never a good idea. Benchmark your application and figure out where the real performance issues are. I'm positive that it won't be because of a single method call, even one that is called frequently. How that method is implemented might be an issue but not the call itself.

Java method call performance

TL;DR JIT compiler has more opportunities to optimize the inner loop in the second case, because on-stack replacement happens at the different point.

I've managed to reproduce the problem with the reduced test case.

No I/O or string operations involved, just two nested loops with array access.

public class NestedLoop {
private static final int ARRAY_SIZE = 5000;
private static final int ITERATIONS = 1000000;

private int[] width = new java.util.Random(0).ints(ARRAY_SIZE).toArray();

public long inline() {
long sum = 0;

for (int i = 0; i < ITERATIONS; i++) {
int min = width[0];
for (int k = 1; k < ARRAY_SIZE; k++) {
if (min > width[k]) {
min = width[k];
}
}
sum += min;
}

return sum;
}

public long methodCall() {
long sum = 0;

for (int i = 0; i < ITERATIONS; i++) {
int min = getMin();
sum += min;
}

return sum;
}

private int getMin() {
int min = width[0];
for (int k = 1; k < ARRAY_SIZE; k++) {
if (min > width[k]) {
min = width[k];
}
}
return min;
}

public static void main(String[] args) {
long startTime = System.nanoTime();
long sum = new NestedLoop().inline(); // or .methodCall();
long endTime = System.nanoTime();

long ms = (endTime - startTime) / 1000000;
System.out.println("sum = " + sum + ", time = " + ms + " ms");
}
}

inline variant indeed works 3-4 times slower than methodCall.


I've used the following JVM options to confirm that both benchmarks are compiled on the highest tier and OSR (on-stack replacement) successfully occurs in both cases.

-XX:-TieredCompilation
-XX:CompileOnly=NestedLoop
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintCompilation
-XX:+TraceNMethodInstalls

'inline' compilation log:

    251   46 %           NestedLoop::inline @ 21 (70 bytes)
Installing osr method (4) NestedLoop.inline()J @ 21

'methodCall' compilation log:

    271   46             NestedLoop::getMin (41 bytes)
Installing method (4) NestedLoop.getMin()I
274 47 % NestedLoop::getMin @ 9 (41 bytes)
Installing osr method (4) NestedLoop.getMin()I @ 9
314 48 % NestedLoop::methodCall @ 4 (30 bytes)
Installing osr method (4) NestedLoop.methodCall()J @ 4

This means JIT does its job, but the generated code must be different.

Let's analyze it with -XX:+PrintAssembly.


'inline' disassembly (the hottest fragment)

0x0000000002df4dd0: inc    %ebp               ; OopMap{r11=Derived_oop_rbx rbx=Oop off=114}
;*goto
; - NestedLoop::inline@53 (line 12)

0x0000000002df4dd2: test %eax,-0x1d64dd8(%rip) # 0x0000000001090000
;*iload
; - NestedLoop::inline@21 (line 12)
; {poll}
0x0000000002df4dd8: cmp $0x1388,%ebp
0x0000000002df4dde: jge 0x0000000002df4dfd ;*if_icmpge
; - NestedLoop::inline@26 (line 12)

0x0000000002df4de0: test %rbx,%rbx
0x0000000002df4de3: je 0x0000000002df4e4c
0x0000000002df4de5: mov (%r11),%r10d ;*getfield width
; - NestedLoop::inline@32 (line 13)

0x0000000002df4de8: mov 0xc(%r10),%r9d ; implicit exception
0x0000000002df4dec: cmp %r9d,%ebp
0x0000000002df4def: jae 0x0000000002df4e59
0x0000000002df4df1: mov 0x10(%r10,%rbp,4),%r8d ;*iaload
; - NestedLoop::inline@37 (line 13)

0x0000000002df4df6: cmp %r8d,%r13d
0x0000000002df4df9: jg 0x0000000002df4dc6 ;*if_icmple
; - NestedLoop::inline@38 (line 13)

0x0000000002df4dfb: jmp 0x0000000002df4dd0

'methodCall' disassembly (also the hottest part)

0x0000000002da2af0: add    $0x8,%edx          ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2af3: cmp $0x1381,%edx
0x0000000002da2af9: jge 0x0000000002da2b70 ;*iload_1
; - NestedLoop::getMin@16 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2afb: mov 0x10(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b00: cmp %r11d,%ecx
0x0000000002da2b03: jg 0x0000000002da2b6b ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b05: mov 0x14(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0a: cmp %r11d,%ecx
0x0000000002da2b0d: jg 0x0000000002da2b5c ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0f: mov 0x18(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b14: cmp %r11d,%ecx
0x0000000002da2b17: jg 0x0000000002da2b4d ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b19: mov 0x1c(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b1e: cmp %r11d,%ecx
0x0000000002da2b21: jg 0x0000000002da2b66 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b23: mov 0x20(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b28: cmp %r11d,%ecx
0x0000000002da2b2b: jg 0x0000000002da2b61 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b2d: mov 0x24(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b32: cmp %r11d,%ecx
0x0000000002da2b35: jg 0x0000000002da2b52 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b37: mov 0x28(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b3c: cmp %r11d,%ecx
0x0000000002da2b3f: jg 0x0000000002da2b57 ;*iinc
; - NestedLoop::getMin@33 (line 36)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b41: mov 0x2c(%r9,%rdx,4),%r11d ;*iaload
; - NestedLoop::getMin@22 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b46: cmp %r11d,%ecx
0x0000000002da2b49: jg 0x0000000002da2ae6 ;*if_icmple
; - NestedLoop::getMin@23 (line 37)
; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b4b: jmp 0x0000000002da2af0

The compiled code is completely different; methodCall is optimized much better.

  • the loop has 8 iterations unrolled;
  • there is no array bounds check inside;
  • width field is cached in the register.

In contrast, inline variant

  • does not do loop unrolling;
  • loads width array from memory every time;
  • performs array bounds check on each iteration.

OSR-compiled methods are not always optimized very well, because they have to maintain the state of an interpreted stack frame at the transition point. Here is another example of the same problem.

On-stack replacement usually occurs on backward branches (i.e. at the bottom of the loop). inline method has two nested loops, and OSR happens inside the inner loop, while methodCall has just one outer loop. OSR transition in the outer loop in more favourable, because JIT compiler has more freedom to optimize the inner loop. And this is what exactly happens in your case.

Is a Java object with hundreds of methods expensive?

The object is not expensive at all.

An object contains a pointer to the object's class, and the methods are stored with the class. Essentially, the methods are all shared. An object of a class with no methods and an object of a class with 10000 methods are the same size (assuming everything else is equal).

The situation would be different if you had 100 fields instead of 100 methods.

You may want to think about if having hundreds of methods in a single class is a good idea. Is the code easy to understand and maintain? Is this an example of the "God object" anti pattern? https://en.m.wikipedia.org/wiki/God_object

Does java take time to call a method?

You will not obtain any meaningful benchmark this way.

You don't account for the JIT.

The compiler will not perform any optimization in this regard, apart from very obvious ones; when it sees a method call in the source code, even if this method call always returns the same value, it will generate bytecode which invokes the method; when it sees a constant, it will generate an ldc (load constant) bytecode instruction.

BUT.

Then the JIT kicks in at some point. If it determines that a method call always returns the same result, then it will inline the call. At runtime. But this is only done after a certain amount of executions of that code are performed, and it always has a way back if it admits that it has missed at some point (this is backtracking).

And that is but one optimization a good JIT implementation can perform.

You want to watch this video. Long story short: with Oracle's JVM, optimization will start to kick in only after a piece of code will be executed 10000 times at least — for some definition of "piece of code".

Efficiency of having logic in-line vs calling a method?

You are correct in your assumptions about java code behaviour, but you are impolite to your professor arguing without data :). Arguing without data is pointless, prove your assumptions with measurements and graphs.

You can use JMH ( http://openjdk.java.net/projects/code-tools/jmh/ ) to create a small benchmark and measure difference between:

  • inlined by hand (remove isEmpty method and place the code in call place)
  • inlined by java jit compiler (hotspot after 100k (?) invocations - see jit print compilation output)
  • disabled hotspot inlining at all

Please read http://www.oracle.com/technetwork/java/whitepaper-135217.html#method

Useful parameters could be:

  • -Djava.compiler=NONE
  • -XX:+PrintCompilation

Plus each jdk version has it's own set of parameters to control jit.

If you will create some set of graphics as results of your research and will politely present them to the professor - I think it will benefit you in future.

I think that https://stackoverflow.com/users/2613885/aleksey-shipilev can help with jmh related questions.

BTW: I had great success when I inlined plenty of methods into a single huge code loop to achieve maximum speed for neural network backpropagation routine cause java is (was?) too lazy to inline methods with methods with methods. It was unmaintainable and fast :(.



Related Topics



Leave a reply



Submit