Why Does the Count of Calls of a Recursive Method Causing a Stackoverflowerror Vary Between Program Runs

Why does the count of calls of a recursive method causing a StackOverflowError vary between program runs?

The observed variance is caused by background JIT compilation.

This is how the process looks like:

Method f() starts execution in interpreter.
After a number of invocations (around 250) the method is scheduled for compilation.
The compiler thread works in parallel to the application thread. Meanwhile the method continues execution in interpreter.
As soon as the compiler thread finishes compilation, the method entry point is replaced, so the next call to f() will invoke the compiled version of the method.

There is basically a race between applcation thread and JIT compiler thread. Interpreter may perform different number of calls before the compiled version of the method is ready. At the end there is a mix of interpreted and compiled frames.

No wonder that compiled frame layout differs from interpreted one. Compiled frames are usually smaller; they don't need to store all the execution context on the stack (method reference, constant pool reference, profiler data, all arguments, expression variables etc.)

Futhermore, there is even more race possibilities with Tiered Compilation (default since JDK 8). There can be a combination of 3 types of frames: interpreter, C1 and C2 (see below).

Let's have some fun experiments to support the theory.

Pure interpreted mode. No JIT compilation.

No races => stable results.
```
$ java -Xint Main
11895
11895
11895
```
Disable background compilation. JIT is ON, but is synchronized with the application thread.

No races again, but the number of calls is now higher due to compiled frames.
```
$ java -XX:-BackgroundCompilation Main
23462
23462
23462
```
Compile everything with C1 before execution. Unlike previous case there will be no interpreted frames on the stack, so the number will be a bit higher.
```
$ java -Xcomp -XX:TieredStopAtLevel=1 Main
23720
23720
23720
```

Now compile everything with C2 before execution. This will produce the most optimized code with the smallest frame. The number of calls will be the highest.

$ java -Xcomp -XX:-TieredCompilation Main
59300
59300
59300

Since the default stack size is 1M, this should mean the frame now is only 16 bytes long. Is it?

$ java -Xcomp -XX:-TieredCompilation -XX:CompileCommand=print,Main.f Main

  0x00000000025ab460: mov    %eax,-0x6000(%rsp)    ; StackOverflow check
  0x00000000025ab467: push   %rbp                  ; frame link
  0x00000000025ab468: sub    $0x10,%rsp            
  0x00000000025ab46c: movabs $0xd7726ef0,%r10      ; r10 = Main.class
  0x00000000025ab476: addl   $0x2,0x68(%r10)       ; Main.counter += 2
  0x00000000025ab47b: callq  0x00000000023c6620    ; invokestatic f()
  0x00000000025ab480: add    $0x10,%rsp
  0x00000000025ab484: pop    %rbp                  ; pop frame
  0x00000000025ab485: test   %eax,-0x23bb48b(%rip) ; safepoint poll
  0x00000000025ab48b: retq

In fact, the frame here is 32 bytes, but JIT has inlined one level of recursion.

Finally, let's look at the mixed stack trace. In order to get it, we'll crash JVM on StackOverflowError (option available in debug builds).

$ java -XX:AbortVMOnException=java.lang.StackOverflowError Main

The crash dump hs_err_pid.log contains the detailed stack trace where we can find interpreted frames at the bottom, C1 frames in the middle and lastly C2 frames on the top.

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5958 [0x00007f21251a5900+0x0000000000000058]
J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020]
  // ... repeated 19787 times ...
J 164 C2 Main.f()V (12 bytes) @ 0x00007f21251a5920 [0x00007f21251a5900+0x0000000000000020]
J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
  // ... repeated 1866 times ...
J 163 C1 Main.f()V (12 bytes) @ 0x00007f211dca50ec [0x00007f211dca5040+0x00000000000000ac]
j  Main.f()V+8
j  Main.f()V+8
  // ... repeated 1839 times ...
j  Main.f()V+8
j  Main.main([Ljava/lang/String;)V+0
v  ~StubRoutines::call_stub

Why does a recursive function stop on random numbers?

The JVM specs very nicely explain its behavior related to stack;

Each Java Virtual Machine thread has a private Java Virtual Machine
stack, created at the same time as the thread. A Java Virtual Machine
stack stores frames (§2.6). A Java Virtual Machine stack is analogous
to the stack of a conventional language such as C: it holds local
variables and partial results, and plays a part in method invocation
and return. Because the Java Virtual Machine stack is never
manipulated directly except to push and pop frames, frames may be heap
allocated. The memory for a Java Virtual Machine stack does not need
to be contiguous.

In the First Edition of The Java® Virtual Machine Specification, the
Java Virtual Machine stack was known as the Java stack.

This specification permits Java Virtual Machine stacks either to be of
a fixed size or to dynamically expand and contract as required by the
computation. If the Java Virtual Machine stacks are of a fixed size,
the size of each Java Virtual Machine stack may be chosen
independently when that stack is created.

A Java Virtual Machine implementation may provide the programmer or
the user control over the initial size of Java Virtual Machine stacks,
as well as, in the case of dynamically expanding or contracting Java
Virtual Machine stacks, control over the maximum and minimum sizes.

The following exceptional conditions are associated with Java Virtual
Machine stacks:

If the computation in a thread requires a larger Java Virtual Machine
stack than is permitted, the Java Virtual Machine throws a
StackOverflowError.

If Java Virtual Machine stacks can be dynamically expanded, and
expansion is attempted but insufficient memory can be made available
to effect the expansion, or if insufficient memory can be made
available to create the initial Java Virtual Machine stack for a new
thread, the Java Virtual Machine throws an OutOfMemoryError.

An important point from this excerpt as far as your question is concerned:

This specification permits Java Virtual Machine stacks either to be of a fixed size or to dynamically expand and contract as required by the computation.

Since you are not providing a stack size, JVM tries to dynamically expand the stack size as the function gets called recursively needing more stack memory. In each run, it may find different amount of dynamic memory for its stack depending on the availability of memory on your computer at that point of run. This is the reason you see a different value for the number of iterations it takes before throwing the SO error. If you configure (using Xss<size> JVM parameter) a smaller stack size to your program, you should see mostly identical number of recursions before the SO error.

Recursive method call cause StackOverFlowError in kotlin but not in java

I want to know why this function works in java and also in kotlin with tailrec but not in kotlin without tailrec?

The short answer is because your Kotlin method is "heavier" than the JAVA one. At every call it calls another method that "provokes" StackOverflowError. So, see a more detailed explanation below.

Java bytecode equivalents for reverseString()

I checked the byte code for your methods in Kotlin and JAVA correspondingly:

Kotlin method bytecode in JAVA

...
public final void reverseString(@NotNull char[] s) {
    Intrinsics.checkParameterIsNotNull(s, "s");
    this.helper(0, ArraysKt.getLastIndex(s), s);
}

public final void helper(int i, int j, @NotNull char[] s) {
    Intrinsics.checkParameterIsNotNull(s, "s");
    if (i < j) {
        char t = s[j];
        s[j] = s[i];
        s[i] = t;
        this.helper(i + 1, j - 1, s);
    }
}
...

JAVA method bytecode in JAVA

...
public void reverseString(char[] s) {
    this.helper(s, 0, s.length - 1);
}

public void helper(char[] s, int left, int right) {
    if (left < right) {
        char temp = s[left];
        s[left++] = s[right];
        s[right--] = temp;
        this.helper(left, right, s);
    }
}
...

So, there're 2 main differences:

Intrinsics.checkParameterIsNotNull(s, "s") is invoked for each helper() in the Kotlin version.
Left and right indices in JAVA method get incremented, while in Kotlin new indices are created for each recursive call.

So, let's test how Intrinsics.checkParameterIsNotNull(s, "s") alone affects the behavior.

Test both implementations

I've created a simple test for both cases:

@Test
public void testJavaImplementation() {
    char[] chars = new char[20000];
    new Example().reverseString(chars);
}

And

@Test
fun testKotlinImplementation() {
    val chars = CharArray(20000)
    Example().reverseString(chars)
}

For JAVA the test succeeded without problems while for Kotlin it failed miserably due to a StackOverflowError. However, after I added Intrinsics.checkParameterIsNotNull(s, "s") to the JAVA method it failed as well:

public void helper(char[] s, int left, int right) {
    Intrinsics.checkParameterIsNotNull(s, "s"); // add the same call here

    if (left >= right) return;
    char tmp = s[left];
    s[left] = s[right];
    s[right] = tmp;
    helper(s, left + 1, right - 1);
}

Conclusion

Your Kotlin method has a smaller recursion depth as it invokes Intrinsics.checkParameterIsNotNull(s, "s") at every step and thus is heavier than its JAVA counterpart. If you don't want this auto-generated method then you can disable null checks during compilation as answered here

However, since you understand what benefit tailrec brings (converts your recursive call into an iterative one) you should use that one.

why is there a stack overflow error in a recursive solution to finding the factorials of a number?

You said:

Given an integer n, return the number of trailing zeroes in n!
Constraints:
0 <= n <= 10⁴

First, your solution won't work because an int can't hold that large a number.
You need to use BigInteger as shown below.

The following recursive form will compute 10⁴! without much noticeable delay.

public static BigInteger factorial(int n) {
     if (n == 1 || n == 0) {
         return BigInteger.ONE;
     }
     return factorial(n-1).multiply(BigInteger.valueOf(n));
}
String fact = factorial(1000).toString();
System.out.println(fact.replaceAll("\\d+?(0*)$", "$1").length());

prints

But you don't need to compute the factorial to solve the actual problem. Consider the following.

The product of all the numbers from 1 to N must have divisors of 10 (i.e. 2 and 5). 5 will occur the least number of times so that is where you need to focus. The number of trailing zeros is equal to the number of times that 10 divides N. And since 5 may divide a given term more than once (e.g. 25, and 125) you need to update the divisor as well.

int n = 1000; // factorial candidate
int sum = 0;
int k;
for (int d = 5; (k = n/d) > 0; d*=5) {
       sum += k;
}
System.out.printf("%d! has %d trailing zeros", n, sum);

prints

1000! has 249 trailing zeros

And here is the recursive solution (although not as efficient).

public static int trailingZeros (int n) {
    if (n > 0) {
        return trailingZeros(n/5) + n/5;
    }
    return 0;
}

how many recursive function calls causes stack overflow?

I want to know that is this normal?

Yes. There's only so much stack size.

In the code below, removing local variable neighbor can prevent from stack overflow?

No. Even with no variables and no return values the function calls themselves must be stored in the stack so the stack can eventually be unwound.

For example...

void recurse() {
    recurse();
}

int main (void)
{
    recurse();
}

This still overflows the stack.

$ ./test
ASAN:DEADLYSIGNAL
=================================================================
==94371==ERROR: AddressSanitizer: stack-overflow on address 0x7ffee7f80ff8 (pc 0x00010747ff14 bp 0x7ffee7f81000 sp 0x7ffee7f81000 T0)
    #0 0x10747ff13 in recurse (/Users/schwern/tmp/./test+0x100000f13)

SUMMARY: AddressSanitizer: stack-overflow (/Users/schwern/tmp/./test+0x100000f13) in recurse
==94371==ABORTING
Abort trap: 6

In general how many recursive function calls causes stack overflow?

That depends on your environment and function calls. Here on OS X 10.13 I'm limited to 8192K by default.

$ ulimit -s
8192

This simple example with clang -g can recurse 261976 times. With -O3 I can't get it to overflow, I suspect compiler optimizations have eliminated my simple recursion.

#include <stdio.h>

void recurse() {
    puts("Recurse");
    recurse();
}

int main (void)
{
    recurse();
}

Add an integer argument and it's 261933 times.

#include <stdio.h>

void recurse(int cnt) {
    printf("Recurse %d\n", cnt);
    recurse(++cnt);
}

int main (void)
{
    recurse(1);
}

Add a double argument, now it's 174622 times.

#include <stdio.h>

void recurse(int cnt, double foo) {
    printf("Recurse %d %f\n", cnt, foo);
    recurse(++cnt, foo);
}

int main (void)
{
    recurse(1, 2.3);
}

Add some stack variables and it's 104773 times.

#include <stdio.h>

void recurse(int cnt, double foo) {
    double this = 42.0;
    double that = 41.0;
    double other = 40.0;
    double thing = 39.0;
    printf("Recurse %d %f %f %f %f %f\n", cnt, foo, this, that, other, thing);
    recurse(++cnt, foo);
}

int main (void)
{
    recurse(1, 2.3);
}

And so on. But I can increase my stack size in this shell and get twice the calls.

$ ./test 2> /dev/null | wc -l
174622
$ ulimit -s 16384
$ ./test 2> /dev/null | wc -l
349385

I have a hard upper limit to how big I can make the stack of 65,532K or 64M.

$ ulimit -Hs
65532

Why are stackoverflow errors chaotic?

Change the printf call to:

printf("%u %p\n", rec, &rec);

This forces gcc to put rec on the stack and gives you its address which is a good indication of what's going on with the stack pointer.

Run your program a few times and note what's going on with the address that's being printed at the end. A few runs on my machine shows this:

261958 0x7fff82d2878c
261778 0x7fffc85f379c
261816 0x7fff4139c78c
261926 0x7fff192bb79c

First thing to note is that the stack address always ends in 78c or 79c. Why is that? We should crash when crossing a page boundary, pages are 0x1000 bytes long and each function eats 0x20 bytes of stack so the address should end with 00X or 01X. But looking at this closer, we crash in libc. So the stack overflow happens somewhere inside libc, from this we can conclude that calling printf and everything else it calls needs at least 0x78c = 1932 (possibly plus X*4096) bytes of stack to work.

The second question is why does it take a different number of iterations to reach the end of the stack? A hint is in the fact that the addresses we get are different on every run of the program.

1 0x7fff8c4c13ac
1 0x7fff0a88f33c
1 0x7fff8d02fc2c
1 0x7fffbc74fd9c

The position of the stack in memory is randomized. This is done to prevent a whole family of buffer overflow exploits. But since memory allocations, especially at this level, can only be done in multiple of pages (4096 bytes) all initial stack pointers would be aligned at 0x1000. This would reduce the number of random bits in the randomized stack address, so additional randomness is added by just wasting a random amount of bytes at the top of the stack.

The operating system can only account the amount of memory you use, including the limit on the stack, in whole pages. So even though the stack starts at a random address, the last accessible address on the stack will always be an address ending in 0xfff.

The short answer is: to increase the amount of randomness in the randomized memory layout a bunch of bytes on the top of the stack are deliberately wasted, but the end of the stack has to end on a page boundary.

Why Does the Count of Calls of a Recursive Method Causing a Stackoverflowerror Vary Between Program Runs