Why Is the Max Recursion Depth I Can Reach Non-Deterministic

Why is the max recursion depth I can reach non-deterministic?

The observed behavior is affected by the HotSpot optimizer, however it is not the only cause. When I run the following code

public static void main(String[] argv) {
System.out.println(System.getProperty("java.version"));
System.out.println(countDepth());
System.out.println(countDepth());
System.out.println(countDepth());
System.out.println(countDepth());
System.out.println(countDepth());
System.out.println(countDepth());
System.out.println(countDepth());
}
static int countDepth() {
try { return 1+countDepth(); }
catch(StackOverflowError err) { return 0; }
}

with JIT enabled, I get results like:

> f:\Software\jdk1.8.0_40beta02\bin\java -Xss68k -server -cp build\classes X
1.8.0_40-ea
2097
4195
4195
4195
12587
12587
12587

> f:\Software\jdk1.8.0_40beta02\bin\java -Xss68k -server -cp build\classes X
1.8.0_40-ea
2095
4193
4193
4193
12579
12579
12579

> f:\Software\jdk1.8.0_40beta02\bin\java -Xss68k -server -cp build\classes X
1.8.0_40-ea
2087
4177
4177
12529
12529
12529
12529

Here, the effect of the JIT is clearly visible, obviously the optimized code needs less stack space, and it’s shown that tiered compilation is enabled (indeed, using -XX:-TieredCompilation shows a single jump if the program runs long enough).

In contrast, with disabled JIT I get the following results:

> f:\Software\jdk1.8.0_40beta02\bin\java -Xss68k -server -Xint -cp build\classes X
1.8.0_40-ea
2104
2104
2104
2104
2104
2104
2104

> f:\Software\jdk1.8.0_40beta02\bin\java -Xss68k -server -Xint -cp build\classes X
1.8.0_40-ea
2076
2076
2076
2076
2076
2076
2076

> f:\Software\jdk1.8.0_40beta02\bin\java -Xss68k -server -Xint -cp build\classes X
1.8.0_40-ea
2105
2105
2105
2105
2105
2105
2105

The values still vary, but not within the single runtime thread and with a lesser magnitude.

So, there is a (rather small) difference that becomes much larger if the optimizer can reduce the stack space required per method invocation, e.g. due to inlining.

What can cause such a difference? I don’t know how this JVM does it but one scenario could be that the way a stack limit is enforced requires a certain alignment of the stack end address (e.g. matching memory page sizes) while the memory allocation returns memory with a start address that has a weaker alignment guaranty. Combine such a scenario with ASLR and there might be always a difference, within the size of the alignment requirement.

Why is the recursion depth non-deterministic (C++)?

What happens when you blow the stack is not a guaranteed crash. Depending on the system, you could just be trashing memory in a relatively random bit of your memory space.

What is in that memory might depend on what memory allocations occurred, how much contiguous memory the OS handed to you when you asked for some, ASLR, or whatever.

Undefined behaviour in C++ is not predictable.

Predictability of stack overflow error in recursive function

I wrote a little test program:

public static void main(String[] args) {
System.out.print(" ");
for (int x = -8; x < 8; x++) {
System.out.printf("%2d", x);
}
System.out.println();

for (int y = -8; y < 8; y++) {
System.out.printf("%4d", y);
for (int x = -8; x < 8; x++) {
System.out.printf("%2d", test(x, y));
}
System.out.println();
}
}

public static int test(int x, int y) {
try {
return foo(x, y);
} catch (StackOverflowError e) {
return 0;
}
}

It generates a matrix like this:

  -8-7-6-5-4-3-2-1 0 1 2 3 4 5 6 7
-8 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
-7 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
-6 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
-5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
-4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
-3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
-2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
-1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
2 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
3 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
4 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
5 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
6 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
7 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

It is pretty clear that the set consists of the sum of {x=0, y=MININT..MAXINT} and {x=0..MAXINT, y>=0..MAXINT}

Regarding the StackOverflowError for foo(5600, 5600) my guess is that it is very close to what your program can handle with the heap size you have given it. Try experimenting with heap size, and I'm sure the point where you hit the roof will change. See eg Increasing heap space in Eclipse: (java.lang.OutOfMemoryError)

From a mathematical point of view, your heap size should be considered infinite. So I would say the sum of the sets above is your answer.

Scala recursion depth differences and StackOverflowError below allowed depth

  1. You get a stack overflow because you cause a stack overflow
  2. Yes
  3. The max stack doesn't change, the JVM sets this. Your two methods recurseTest and sumArrayRec push different amounts to the stack. recurseTest is basically adding 2 ints the stack with each call while sumArrayRec is adding 3 ints since you have a call to elms(i), probably more with addition/ifelse (more instructions). It's tough to tell since the JVM is handling all of this and it's purpose is to obscure this from what you need to know. Also, who knows what optimizations the JVM is doing behind the scenes which can drastically effect the stack sizes created from each method call. On multiple runs you'll get different depths for stack due to system timing etc, maybe the jvm will optimize in the short time the program has run, maybe it won't, it's non-deterministic in this scenario. If you ran some warmup code before hand that might make your tests more deterministic so any optimizations will or will not take place.

You should look into using something like JMH for your tests, it will help with the determinism.

Side note: you can also manually change your stack size with -Xss2M, common for SBT usage.

Is there a way to know maximally reached JVM call stack depth for a particular program run?

One can easily make JVMTI agent that will trace MethodEntry / MethodExit events and correspondingly increase or decrease stack depth counter. Here is an example of such agent. When the program ends, it will print the maximum recorded Java stack depth.

#include <jvmti.h>
#include <stdint.h>
#include <stdio.h>

static volatile int max_depth = 0;

static int adjust_stack_depth(jvmtiEnv *jvmti, int delta) {
intptr_t depth = 0;
(*jvmti)->GetThreadLocalStorage(jvmti, NULL, (void**)&depth);
(*jvmti)->SetThreadLocalStorage(jvmti, NULL, (const void*)(depth + delta));
return (int)depth;
}

void JNICALL MethodEntry(jvmtiEnv *jvmti, JNIEnv* jni, jthread thread, jmethodID method) {
adjust_stack_depth(jvmti, +1);
}

void JNICALL MethodExit(jvmtiEnv *jvmti, JNIEnv* jni, jthread thread, jmethodID method,
jboolean was_popped_by_exception, jvalue return_value) {
int depth = adjust_stack_depth(jvmti, -1);
if (depth > max_depth) {
max_depth = depth; // TODO: replace with atomic CAS to avoid race condition
}
}

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved) {
jvmtiEnv* jvmti;
(*vm)->GetEnv(vm, (void**)&jvmti, JVMTI_VERSION_1_0);

jvmtiCapabilities capabilities = {0};
capabilities.can_generate_method_entry_events = 1;
capabilities.can_generate_method_exit_events = 1;
(*jvmti)->AddCapabilities(jvmti, &capabilities);

jvmtiEventCallbacks callbacks = {0};
callbacks.MethodEntry = MethodEntry;
callbacks.MethodExit = MethodExit;
(*jvmti)->SetEventCallbacks(jvmti, &callbacks, sizeof(callbacks));

(*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE, JVMTI_EVENT_METHOD_ENTRY, NULL);
(*jvmti)->SetEventNotificationMode(jvmti, JVMTI_ENABLE, JVMTI_EVENT_METHOD_EXIT, NULL);

return 0;
}

JNIEXPORT void JNICALL Agent_OnUnload(JavaVM *vm) {
printf("Max stack depth = %d\n", max_depth);
}

Compile:

gcc -fPIC -shared -I $JAVA_HOME/include -I $JAVA_HOME/include/linux -o libmaxdepth.so maxdepth.c

Run:

java -agentpath:/path/to/libmaxdepth.so MyProgram

However, tracing each method entry and exit is very expensive. A less accurate, but much more efficient alternative would be a sampling profiler which periodically records a stack trace of a running thread, e.g. async-profiler or Java Flight Recorder.

About Java VM Stack Overflow

Array allocation affects execution time of the program. The longer the program runs, the more are the chances that JIT compiler will kick in, and the program will continue execution in compiled mode.

The difference in the stack depth is explain by the background JIT compilation. See this answer for details.

To make the fair comparison, run JVM with JIT compilation turned off: -Xint. In this case, array allocation will make the maximum recursion depth expectedly smaller (since there will one more stack slot used for an array reference).

Why does the count of calls of a recursive method causing a StackOverflowError vary between program runs?

The observed variance is caused by background JIT compilation.

This is how the process looks like:

  1. Method f() starts execution in interpreter.
  2. After a number of invocations (around 250) the method is scheduled for compilation.
  3. The compiler thread works in parallel to the application thread. Meanwhile the method continues execution in interpreter.
  4. As soon as the compiler thread finishes compilation, the method entry point is replaced, so the next call to f() will invoke the compiled version of the method.

There is basically a race between applcation thread and JIT compiler thread. Interpreter may perform different number of calls before the compiled version of the method is ready. At the end there is a mix of interpreted and compiled frames.

No wonder that compiled frame layout differs from interpreted one. Compiled frames are usually smaller; they don't need to store all the execution context on the stack (method reference, constant pool reference, profiler data, all arguments, expression variables etc.)

Futhermore, there is even more race possibilities with Tiered Compilation (default since JDK 8). There can be a combination of 3 types of frames: interpreter, C1 and C2 (see below).


Let's have some fun experiments to support the theory.

  1. Pure interpreted mode. No JIT compilation.

    No races => stable results.

    $ java -Xint Main
    11895
    11895
    11895
  2. Disable background compilation. JIT is ON, but is synchronized with the application thread.

    No races again, but the number of calls is now higher due to compiled frames.

    $ java -XX:-BackgroundCompilation Main
    23462
    23462
    23462
  3. Compile everything with C1 before execution. Unlike previous case there will be no interpreted frames on the stack, so the number will be a bit higher.

    $ java -Xcomp -XX:TieredStopAtLevel=1 Main
    23720
    23720
    23720
  4. Now compile everything with C2 before execution. This will produce the most optimized code with the smallest frame. The number of calls will be the highest.

    $ java -Xcomp -XX:-TieredCompilation Main
    59300
    59300
    59300

    Since the default stack size is 1M, this should mean the frame now is only 16 bytes long. Is it?

    $ java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=print,Main.f Main

    0x00000000025ab460: mov %eax,-0x6000(%rsp) ; StackOverflow check
    0x00000000025ab467: push %rbp ; frame link
    0x00000000025ab468: sub $0x10,%rsp
    0x00000000025ab46c: movabs $0xd7726ef0,%r10 ; r10 = Main.class
    0x00000000025ab476: addl $0x2,0x68(%r10) ; Main.counter += 2
    0x00000000025ab47b: callq 0x00000000023c6620 ; invokestatic f()
    0x00000000025ab480: add $0x10,%rsp
    0x00000000025ab484: pop %rbp ; pop frame
    0x00000000025ab485: test %eax,-0x23bb48b(%rip) ; safepoint poll
    0x00000000025ab48b: retq

    In fact, the frame here is 32 bytes, but JIT has inlined one level of recursion.

  5. Finally, let's look at the mixed stack trace. In order to get it, we'll crash JVM on StackOverflowError (option available in debug builds).

    $ java -XX:AbortVMOnException=java.lang.StackOverflowError Main

    The crash dump hs_err_pid.log contains the detailed stack trace where we can find interpreted frames at the bottom, C1 frames in the middle and lastly C2 frames on the top.

    Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
    J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5958 [0x00007f21251a5900+0x0000000000000058]
    J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020]
    // ... repeated 19787 times ...
    J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020]
    J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
    J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
    // ... repeated 1866 times ...
    J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
    j Main.f()V+8
    j Main.f()V+8
    // ... repeated 1839 times ...
    j Main.f()V+8
    j Main.main([Ljava/lang/String;)V+0
    v ~StubRoutines::call_stub


Related Topics



Leave a reply



Submit