Getting Stacktrace from Core Dump

getting stacktrace from core dump

gdb /usr/bin/myapp.binary corefile

Then, use one of:

(gdb) bt
(gdb) bt full
(gdb) info threads
(gdb) thread apply all bt
(gdb) thread apply all bt full

Note that installing debug symbols for the related libraries will help

Dumping only stack trace in linux core dumps

You can set /proc/$PID/coredump_filter to 0x10.

See http://man7.org/linux/man-pages/man5/core.5.html

Analyzing core dump with stack corrupted

If frame 1 does not make sense at a source level, you might try looking at disassembly of frame 1. After selecting that frame, disass $pc should show you the disassembly for the entire function, with => to indicate the return address (the instruction immediately after the call to frame 0).

In the case of a null function pointer dereference, the instruction for the call to frame 0 might involve a simple register dereference, in which case you'd want to understand how that register obtained the null value. In some cases including /m in a disass command can be helpful, although it can cause confusion because of the distinction between instruction boundaries and source line boundaries. Omitting /m is more likely to display a meaningful return address.

The => in the updated disassembly (without /m) makes sense. In any frame aside from frame 0, the pc value (what the => points at in the disassembly) indicates the instruction which will execute when the next lowest numbered frame returns (which, due to the crash, did not occur in this case). The pc value in frame 1 is not the value of the pc register at the time of the crash, but rather the saved pc value pushed on the stack by the call instruction. One way to see that is to compare output from x/a $sp in frame 0 to x/i $pc in frame 1.

One way to interpret this disassembly is that edx is some object, and [edx+0x14] points into its vtable. One way the vtable might wind up with a null pointer is a memory allocation issue with a stale reference to a chunk of memory which has been deallocated and subsequently overwritten by its rightful owner (the next piece of code to allocate that chunk). If any of that is applicable here, it can work either way (the code in frame 1 might be the culprit, or it might be the victim). There are other reasons memory might be overwritten with incorrect contents, but double allocation might be a good place to start.

It probably makes sense to examine the contents of the object referenced by edx in frame 1, to see if there are any other anomalies besides what could be an incorrect vtable. Both the print command and the x command (within gdb) can be useful for this. My best guess about which object is referenced by edx, based on disass/m output (at this writing, visible only in the edit history of the question), is _listener, but it would be best to confirm that by further study of the disassembly (the excerpt available here does not seem to include the instruction that determines the value of edx).

Minimal core dump (stack trace + current frame only)

I have "solved" this issue in two ways:

I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).

I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer

I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).

For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:

see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

Getting Stacktrace from Core Dump