How Does the Jvm Decided to Jit-Compile a Method (Categorize a Method as "Hot")

How does the JVM decided to JIT-compile a method (categorize a method as hot)?

HotSpot compilation policy is rather complex, especially for Tiered Compilation, which is on by default in Java 8. It's neither a number of executions, nor a matter of CompileThreshold parameter.

The best explanation (apparently, the only reasonable explanation) can be found in HotSpot sources, see advancedThresholdPolicy.hpp.

I'll summarize the main points of this advanced compilation policy:

  • Execution starts at tier 0 (interpreter).
  • The main triggers for compilation are

    1. method invocation counter i;
    2. backedge counter b. Backward branches typically denote a loop in the code.
  • Every time counters reach certain frequency value (TierXInvokeNotifyFreqLog, TierXBackedgeNotifyFreqLog), a compilation policy is called to decide what to do next with currently running method. Depending on the values of i, b and current load of C1 and C2 compiler threads it can be decided to

    • continue execution in interpreter;
    • start profiling in interpreter;
    • compile method with C1 at tier 3 with full profile data required for futher recompilation;
    • compile method with C1 at tier 2 with no profile but with possibility to recompile (unlikely);
    • finally compile method with C1 at tier 1 with no profile or counters (also unlikely).

    Key parameters here are TierXInvocationThreshold and TierXBackEdgeThreshold. Thresholds can be dynamically adjusted for a given method depending on the length of compilation queue.

  • Compilation queue is not FIFO, but rather a priority queue.

  • C1-compiled code with profile data (tier 3) behave similarly, except that thresholds for switching to the next level (C2, tier 4) are much bigger. E.g. an interpreted method can be compiled at tier 3 after about 200 invocations, while C1-compiled method is subject for recompilation at tier 4 after 5000+ invocations.

  • A special policy is used for method inlining. Tiny methods can be inlined into the caller even if they are not "hot". A bit larger methods can be inlined only if they are invoked frequently (InlineFrequencyRatio, InlineFrequencyCount).

How does HotSpot JVM identify hot methods?

Searching the JDK's mercurial repository reveals that the AdvancedThresholdPolicy was merged into the SimpleThresholdPolicy in commit 5201c9474ee7 as part of 8202711, and may now be found in src/hotspot/share/runtime/simpleThresholdPolicy.cpp.

Why JIT compiles some methods at the startup?

From the documentation of -XX:CompileThreshold:

This option is ignored when tiered compilation is enabled;

see the option -XX:-TieredCompilation.

So when specifying -XX:-TieredCompilation, most of these entries will go away, however, some entries may still be exempted from the counter based compilation decision.

What exactly does -XX:-TieredCompilation do?

-XX:-TieredCompilation disables intermediate compilation tiers (1, 2, 3), so that a method is either interpreted or compiled at the maximum optimization level (C2).

As a side effect TieredCompilation flag also changes the number of compiler threads, the compilation policy and the default code cache size. Note that with TieredCompilation disabled

  • there will be less compiler threads;
  • simple compilation policy (based on method invocation and backedge counters) will be chosen instead of advanced compilation policy;
  • default reserved code cache size will be 5 times smaller.

To disable C2 compiler and to leave only C1 with no extra overhead, set -XX:TieredStopAtLevel=1.

To disable all JIT compilers and to run everything in interpreter, use -Xint.

How to print the JIT compilation messages for all the methods get compiled to native code of a given class

If by "JIT compilation messages" you mean the generated assembly code for all methods of the given class, use the following syntax:

-XX:CompileCommand=print,org.pkg.TheGivenClass::*

Confusion about HotSpot JVM JIT

Assuming you are asking about HotSpot JVM, the answer is the remaining interations will be executed in compiled code.

HotSpot JVM has a technique known as 'on-stack replacement' to switch from interpreter to compiled code while the method is running.

http://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary.html

on-stack replacement
Also known as 'OSR'. The process of converting an
interpreted (or less optimized) stack frame into a compiled (or more
optimized) stack frame. This happens when the interpreter discovers
that a method is looping, requests the compiler to generate a special
nmethod with an entry point somewhere in the loop (specifically, at a
backward branch), and transfers control to that nmethod. A rough
inverse to deoptimization.

If you run JVM with -XX:+PrintCompilation flag, OSR compilations will be marked by % sign:

    274   27       3       java.lang.String::lastIndexOf (52 bytes)
275 29 3 java.lang.String::startsWith (72 bytes)
275 28 3 java.lang.String::startsWith (7 bytes)
275 30 3 java.util.Arrays::copyOf (19 bytes)
276 32 4 java.lang.AbstractStringBuilder::append (29 bytes)
276 31 s 3 java.lang.StringBuffer::append (13 bytes)
283 33 % 3 LoopTest::myLongLoop @ 13 (43 bytes)
^ ^
OSR bytecode index of OSR entry

UPDATE

Typically after OSR compilation a regular compilation is also queued, so that the next time the method is called, it will start running directly in compiled mode.

    187   32 %     3       LoopTest::myLongLoop @ 13 (43 bytes)
187 33 3 LoopTest::myLongLoop (43 bytes)

However, if a regular compilation is not complete by the time the method is called again, the method will start running in interpreter, and then will switch to OSR entry inside a loop.

JIT recompiles to do fast Throw after more iterations if stacktrace is of even length

This effect is a result of tricky tiered compilation and inlining policy.

Let me explain on the simplified example:

public class TestFastThrow {

public static void main(String[] args) {
for (int iteration = 0; ; iteration++) {
try {
throwsNPE(2);
} catch (Exception e) {
if (e.getStackTrace().length == 0) {
System.out.println("Iterations to fastThrow: " + iteration);
break;
}
}
}
}

static void throwsNPE(int depth) {
if (depth <= 1) {
((Object) null).getClass();
}
throwsNPE(depth - 1);
}
}

For simplicity, I'll exclude all methods from compilation, except throwsNPE.

-XX:CompileCommand=compileonly,TestFastThrow::throwsNPE -XX:+PrintCompilation
  1. HotSpot uses Tiered Compilation by default. Here throwsNPE is first compiled at tier 3 (C1 with profiling). Profiling in C1 makes possible to recompile the method later by C2.

  2. OmitStackTraceInFastThrow optimization works only in C2 compiled code. So, the sooner the code is compiled by C2 - the less iterations will pass before the loop finishes.

  3. How profiling in C1-compiled code works: the counter is incremented on every method invocation and on every backward branch (however, there are no backward branches in throwsNPE method). When the counter reaches certain configurable threshold, JVM compilation policy decides whether current method needs to be recompiled.

  4. throwsNPE is a recursive method. HotSpot can inline recursive calls up to -XX:MaxRecursiveInlineLevel (default value is 1).

  5. The frequency how often C1 compiled code calls back to JVM compilation policy, differs for regular invocations vs. inlined invocations. A regular method notifies JVM every 210 invocations (-XX:Tier3InvokeNotifyFreqLog=10), while the inlined method notifies JVM much more rarely: every 220 invocations (-XX:Tier23InlineeNotifyFreqLog=20).

  6. For the even number of recursive calls, all invocations follow Tier23InlineeNotifyFreqLog parameter. When the number of calls is odd, inlining does not work for the last leftover call, and this last invocation follows Tier3InvokeNotifyFreqLog parameter.

  7. This means, when the call depth is even, throwsNPE will be recompiled only after 220 calls, i.e. after 219 loop iterations. That's exactly what you'll see when you run the above code with throwNPE(2):

    Iterations to fastThrow: 524536

    524536 is very close to 219 = 524288

    Now, if you run the same application with -XX:Tier23InlineeNotifyFreqLog=15, the number of iterations will be close to 214 = 16384.

    Iterations to fastThrow: 16612
  8. Now let's change the code to call throwsNPE(1). The program will finish very quickly, regardless of Tier23InlineeNotifyFreqLog value. That's because the different option rules now. But if I rerun the program with -XX:Tier3InvokeNotifyFreqLog=20, the loop will finish not earlier than after 220 iterations:

    Iterations to fastThrow: 1048994

Summary

Fast throw optimization applies only to C2-compiled code. Due to one level of inlining (-XX:MaxRecursiveInlineLevel), C2 compilation is triggered earlier (after 2Tier3InvokeNotifyFreqLog invocations, if the number of recursive calls is odd), or later (after 2Tier23InlineeNotifyFreqLog invocations, if all recursive calls are covered by inlining).

Meaning of callee is too large in jvm +LogCompilation output

HotSpot JVM has two JIT compilers: C1 and C2. They work together in Tiered mode (the default). The inlining strategy is not quite trivial, but the most simple factor is the size of the callee method in bytecodes.

  • "callee is too large" message is printed by C1 when the size in bytecodes of the method being inlined is larger than MaxInlineSize (35) multiplied by NestedInliningSizeRatio (90%) on each next level of inlining.
  • "too big" and "hot method too big" messages are printed by C2 when the size of the method being inlined is larger than MaxInlineSize (35) or FreqInlineSize (325) respectively.

So, both messages mean approximately the same, but on different tiers of compilation.

What is an application binary interface (ABI)?

One easy way to understand "ABI" is to compare it to "API".

You are already familiar with the concept of an API. If you want to use the features of, say, some library or your OS, you will program against an API. The API consists of data types/structures, constants, functions, etc that you can use in your code to access the functionality of that external component.

An ABI is very similar. Think of it as the compiled version of an API (or as an API on the machine-language level). When you write source code, you access the library through an API. Once the code is compiled, your application accesses the binary data in the library through the ABI. The ABI defines the structures and methods that your compiled application will use to access the external library (just like the API did), only on a lower level. Your API defines the order in which you pass arguments to a function. Your ABI defines the mechanics of how these arguments are passed (registers, stack, etc.). Your API defines which functions are part of your library. Your ABI defines how your code is stored inside the library file, so that any program using your library can locate the desired function and execute it.

ABIs are important when it comes to applications that use external libraries. Libraries are full of code and other resources, but your program has to know how to locate what it needs inside the library file. Your ABI defines how the contents of a library are stored inside the file, and your program uses the ABI to search through the file and find what it needs. If everything in your system conforms to the same ABI, then any program is able to work with any library file, no matter who created them. Linux and Windows use different ABIs, so a Windows program won't know how to access a library compiled for Linux.

Sometimes, ABI changes are unavoidable. When this happens, any programs that use that library will not work unless they are re-compiled to use the new version of the library. If the ABI changes but the API does not, then the old and new library versions are sometimes called "source compatible". This implies that while a program compiled for one library version will not work with the other, source code written for one will work for the other if re-compiled.

For this reason, developers tend to try to keep their ABI stable (to minimize disruption). Keeping an ABI stable means not changing function interfaces (return type and number, types, and order of arguments), definitions of data types or data structures, defined constants, etc. New functions and data types can be added, but existing ones must stay the same. If, for instance, your library uses 32-bit integers to indicate the offset of a function and you switch to 64-bit integers, then already-compiled code that uses that library will not be accessing that field (or any following it) correctly. Accessing data structure members gets converted into memory addresses and offsets during compilation and if the data structure changes, then these offsets will not point to what the code is expecting them to point to and the results are unpredictable at best.

An ABI isn't necessarily something you will explicitly provide unless you are doing very low-level systems design work. It isn't language-specific either, since (for example) a C application and a Pascal application can use the same ABI after they are compiled.

Edit: Regarding your question about the chapters regarding the ELF file format in the SysV ABI docs: The reason this information is included is because the ELF format defines the interface between operating system and application. When you tell the OS to run a program, it expects the program to be formatted in a certain way and (for example) expects the first section of the binary to be an ELF header containing certain information at specific memory offsets. This is how the application communicates important information about itself to the operating system. If you build a program in a non-ELF binary format (such as a.out or PE), then an OS that expects ELF-formatted applications will not be able to interpret the binary file or run the application. This is one big reason why Windows apps cannot be run directly on a Linux machine (or vice versa) without being either re-compiled or run inside some type of emulation layer that can translate from one binary format to another.

IIRC, Windows currently uses the Portable Executable (or, PE) format. There are links in the "external links" section of that Wikipedia page with more information about the PE format.

Also, regarding your note about C++ name mangling: When locating a function in a library file, the function is typically looked up by name. C++ allows you to overload function names, so name alone is not sufficient to identify a function. C++ compilers have their own ways of dealing with this internally, called name mangling. An ABI can define a standard way of encoding the name of a function so that programs built with a different language or compiler can locate what they need. When you use extern "c" in a C++ program, you're instructing the compiler to use a standardized way of recording names that's understandable by other software.



Related Topics



Leave a reply



Submit