How to Catch Sigsegv (Segmentation Fault) and Get a Stack Trace Under Jni on Android

How can I catch SIGSEGV (segmentation fault) and get a stack trace under JNI on Android?

Edit: From Jelly Bean onwards you can't get the stack trace, because READ_LOGS went away. :-(

I actually got a signal handler working without doing anything too exotic, and have released code using it, which you can see on github (edit: linking to historical release; I removed the crash handler since then). Here's how:

  1. Use sigaction() to catch the signals and store the old handlers. (android.c:570)
  2. Time passes, a segfault happens.
  3. In the signal handler, call up to JNI one last time and then call the old handler. (android.c:528)
  4. In that JNI call, log any useful debugging info, and call startActivity() on an activity that is flagged as needing to be in its own process. (SGTPuzzles.java:962, AndroidManifest.xml:28)
  5. When you come back from Java and call that old handler, the Android framework will connect to debuggerd to log a nice native trace for you, and then the process will die. (debugger.c, debuggerd.c)
  6. Meanwhile, your crash-handling activity is starting up. Really you should pass it the PID so it can wait for step 5 to complete; I don't do this. Here you apologise to the user and ask if you can send a log. If so, gather the output of logcat -d -v threadtime and launch an ACTION_SEND with recipient, subject and body filled in. The user will have to press Send. (CrashHandler.java, SGTPuzzles.java:462, strings.xml:41
  7. Watch out for logcat failing or taking more than a few seconds. I have encountered one device, the T-Mobile Pulse / Huawei U8220, where logcat immediately goes into the T (traced) state and hangs. (CrashHandler.java:70, strings.xml:51)

In a non-Android situation, some of this would be different. You'd need to gather your own native trace, see this other question, depending on what sort of libc you have. You'd need to handle dumping that trace, launching your separate crash-handler process, and sending the email in some appropriate ways for your platform, but I imagine the general approach should still work.

C++ signal handler can't notify Java side

You can't do this:

void SignalErrorHandler(int signal, siginfo_t *si, void *arg)
{
JNIEnv *env;
g_JVM->GetEnv((void**)&env, JNI_VERSION_1_6);

jclass myClass = env->FindClass("com/company/MyClass");
jmethodID myMethod = env->GetMethodID(myClass, "nativeCrashed", "()V" );
env->CallVoidMethod(g_thejavaobject, myMethod);

env->DeleteLocalRef(myClass);
}

That will not work for at least two fundamental reasons.

First, only async-signal-safe functions may be called from within a signal handler. The POSIX-specified list can be found at http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03.

No Java JNI call is async-signal-safe.

Second, the Java JVM uses SIGSEGV internally - getting a SIGSEGV is not necessarily fatal:

Signals Used in Oracle Solaris, Linux, and macOS

...

SIGSEGV, SIGBUS, SIGFPE, SIGPIPE, SIGILL These signals are used in
the implementation for implicit null check, and so forth.

Segmentation fault in native library when running under Android prior to 4.0.3

There is a change in the way JNI references are handled in ICS. This blogpost explains it: http://android-developers.blogspot.ch/2011/11/jni-local-reference-changes-in-ics.html

It depends on your targetSdkVersion, maybe you can try to set it lower to use the compatibility mode.

Catching SIGSEGV when triggered by corrupt stack

Okay, here is a solution to the above problem following EOF's comment (using sigaltstack() to provide a signal stack on the heap):

#include <stdio.h>

#define __USE_GNU
#include <signal.h>
#include <stdlib.h>
#include <ucontext.h>

static long long int sbase;

static void catch_function(int sig, siginfo_t *info, void *cntxt)
{
puts("handler works");

/* reset RSP if invalid */
ucontext_t *uc_context = (ucontext_t *)cntxt;
if(!uc_context->uc_mcontext.gregs[REG_RSP])
{
puts("resetting RSP");
uc_context->uc_mcontext.gregs[REG_RSP] = sbase;
}
}

void main(int argc, char **argv)
{
/* RSP during main */
sbase = (long long int)&argv;

stack_t ss;
struct sigaction sa;

ss.ss_sp = malloc(SIGSTKSZ);
ss.ss_size = SIGSTKSZ;
ss.ss_flags = 0;
sigaltstack(&ss, NULL);

sa.sa_sigaction = (void *)catch_function;
sigemptyset (&sa.sa_mask);
sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK;

sigaction(SIGSEGV, &sa, NULL);

puts("testing handler");
raise(SIGSEGV);
puts("back");

__asm__ (
"xor %rax, %rax\n\t"
"mov %rax, %rsp\n\t"
"push %rax\n\t"
"pop %rax" );

puts("exiting.");
}

The alternative signal stack is allocated on the heap and registered using sigaltstack(&ss,NULL). Also, the SA_ONSTACK flag is set in the sigaction struct to enable the alternative stack use for this specific action.

This basically resolves my problem, because now we see an endless stream of SIGSEGVs being caught. After all, the above catch_function() doesn't do much to fix the invalid stack pointer. As a solution, I now store the valid stack pointer for the main() in sbase and use that to restore it in the handler if it's invalid (through manipulation of the saved thread context).

To make all of this work, I also fixed my inline assembly to not just push a value but also pop it back afterwards, so the stack height remains unchanged. For the sake of replicability, I also included the includes this time.

SIGSEGV SEGV_MAPERR 0x00000000fbadbeef in libwebviewchromium.so

From the Chromium WebKit sources — see the comment just above the macro definition — , this is their code for "known, unrecoverable errors like out-of-memory".



Related Topics



Leave a reply



Submit