Why Does Java App Crash in Gdb But Runs Normally in Real Life

Why does java app crash in gdb but runs normally in real life?

Why does java app crash in gdb but runs normally in real life?

Because it doesn't actually crash.

Java uses speculative loads. If a pointer points to addressable memory, the load succeeds. Rarely the pointer does not point to addressable memory, and the attempted load generates SIGSEGV ... which java runtime intercepts, makes the memory addressable again, and restarts the load instruction.

When debugging java programs, one has to generally do this:

(gdb) handle SIGSEGV nostop noprint pass

Unfortunately, if there is some JNI code involved, and that code SIGSEGVs, GDB will happily ignore that signal as well, resulting in the death of inferior (being debugged) process. I have not found an acceptable solution for that latter problem.

C++ Program with JNI invoking failed to run in gdb

Why does java app crash in gdb but runs normally in real life? provided a solution:

handle SIGSEGV nostop noprint pass

While, it is not so elegant.

Why am I getting storage error on my Ada shared library when running under a JVM

The first hint is getting a bunch of SIGSEGV when trying to attach a debugger to the application, then seeing the program resuming when continuing.

It means that the SIGSEGV signal is handled on the Java side, as confirmed in Why does java app crash in gdb but runs normally in real life?.

Java uses speculative loads. If a pointer points to addressable memory, the load succeeds. Rarely the pointer does not point to addressable memory, and the attempted load generates SIGSEGV ... which java runtime intercepts, makes the memory addressable again, and restarts the load instruction.

Now what happens, is that by default, the GNAT run-time installs a new signal handler to catch SIGSEGV and redirect to a clean Ada exception. One interesting feature of Ada exceptions is that they can print the stack trace, even without a debugger. This SIGSEGV handler redirection allows this.

But in the case of Java, since Java uses speculative loads, SIGSEGV are expected from time to time on the java side. So when the Ada shared library has been loaded & initialized, the Ada SIGSEGV handler is installed, and catches those "normal" SIGSEGV, and aborts immediately.

Note that it doesn't happen under Windows. The java runtime probably cannot use this speculative load mechanism because of Windows limitations when handling memory violation accesses.

The signal handling is done in s-intman.adb

 --  Check that treatment of exception propagation here is consistent with
  --  treatment of the abort signal in System.Task_Primitives.Operations.

  case signo is
     when SIGFPE  => raise Constraint_Error;
     when SIGILL  => raise Program_Error;
  --   when SIGSEGV => raise Storage_Error;  -- commenting this line should fix it
     when SIGBUS  => raise Storage_Error;
     when others  => null;
  end case;
end Notify_Exception;

Now we'd have to rebuild a new native run-time and use it instead of the default one. That is pretty tedious and error prone. That file is part of gnarl library. We'd have to rebuild the gnarl library dynamically with the proper options -gnatp -nostdinc -O2 -fPIC to create a gnatrl library substitution... and do that again when upgrading the compiler...

Fortunately, an alternate solution was provided by AdaCore:

First create a pragmas file in the .gpr project directory (let's call it no_sigsegv.adc) containing:

pragma Interrupt_State (SIGSEGV, SYSTEM);

to instruct the run-time not to install the SIGSEGV handler

Then add this to the Compiler package of the .gpr file:

  package Compiler is
    ...
      for local_configuration_pragmas use Project'Project_dir & "/no_sigsegv.adc";

and rebuild everything from scratch. Testing: not a single crash whatsoever.

Why does the address of a local variable vary when executing multiple times, but not when debugging it with GDB?

The reason you always get the same address for local variables while running under GDB is that GDB (in order to simplify most debugging scenarios) disables address space randomization.

You can ask GDB to not do that with set disable-address-randomization off.

For curious, disabling of address randomization for the current process does not require any privilege, and is done by calling personality(2). Here is the patch that added this feature.

Why can an Rsession process continue after SIGSEGV and what does it mean

The SIGSEGV does not occur in Rsession's process, but in the JVM process launched by rJava on package load. This behaviour is known and due to JVM's memory management, as stated here:

Java uses speculative loads. If a pointer points to addressable
memory, the load succeeds. Rarely the pointer does not point to
addressable memory, and the attempted load generates SIGSEGV ... which
java runtime intercepts, makes the memory addressable again, and
restarts the load instruction.

The proposed workaround for gdb works fine:

(gdb) handle SIGSEGV nostop noprint pass

Why Does Java App Crash in Gdb But Runs Normally in Real Life