Minimal Core Dump (Stack Trace + Current Frame Only)

Minimal core dump (stack trace + current frame only)

I have "solved" this issue in two ways:

I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).

I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer

I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).

For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:

see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

Dumping only stack trace in linux core dumps

You can set /proc/$PID/coredump_filter to 0x10.

See http://man7.org/linux/man-pages/man5/core.5.html

Analyzing core dump with stack corrupted

If frame 1 does not make sense at a source level, you might try looking at disassembly of frame 1. After selecting that frame, disass $pc should show you the disassembly for the entire function, with => to indicate the return address (the instruction immediately after the call to frame 0).

In the case of a null function pointer dereference, the instruction for the call to frame 0 might involve a simple register dereference, in which case you'd want to understand how that register obtained the null value. In some cases including /m in a disass command can be helpful, although it can cause confusion because of the distinction between instruction boundaries and source line boundaries. Omitting /m is more likely to display a meaningful return address.

The => in the updated disassembly (without /m) makes sense. In any frame aside from frame 0, the pc value (what the => points at in the disassembly) indicates the instruction which will execute when the next lowest numbered frame returns (which, due to the crash, did not occur in this case). The pc value in frame 1 is not the value of the pc register at the time of the crash, but rather the saved pc value pushed on the stack by the call instruction. One way to see that is to compare output from x/a $sp in frame 0 to x/i $pc in frame 1.

One way to interpret this disassembly is that edx is some object, and [edx+0x14] points into its vtable. One way the vtable might wind up with a null pointer is a memory allocation issue with a stale reference to a chunk of memory which has been deallocated and subsequently overwritten by its rightful owner (the next piece of code to allocate that chunk). If any of that is applicable here, it can work either way (the code in frame 1 might be the culprit, or it might be the victim). There are other reasons memory might be overwritten with incorrect contents, but double allocation might be a good place to start.

It probably makes sense to examine the contents of the object referenced by edx in frame 1, to see if there are any other anomalies besides what could be an incorrect vtable. Both the print command and the x command (within gdb) can be useful for this. My best guess about which object is referenced by edx, based on disass/m output (at this writing, visible only in the edit history of the question), is _listener, but it would be best to confirm that by further study of the disassembly (the excerpt available here does not seem to include the instruction that determines the value of edx).

What does ?? in gdb backtrace mean and how to get the actual stack frames?

Those ?? are usually where the name of the function is displayed. GDB does not know the name of those functions and therefore displays ??.

Now, why is this happening? Depends. GCC compiles including symbols (e.g. function names and similar) by default. Most probably you are working with a stripped version, where symbols have been removed, or just with the wrong file.

As @zwol suggests, the line you see warning: exec file is newer than core file is an indication of the fact that something else is going on that you don't show in your question. You are working on a core dump file generated by the crashed executable, which is outdated.

I would suggest you to re-compile the program from scratch and make sure that you are opening the right file with GDB. First produce a new core dump by crashing the new program, then open it in GDB.

Assuming the following program.c:

int main(void) { return 1/0; }

This should work:

$ rm -f core
$ gcc program.c -o program
$ ./program
Floating point exception (core dumped)

$ gdb program core
Reading symbols from program...(no debugging symbols found)...done.
[New LWP 11896]
Core was generated by `./program'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0  0x000055d24a4cd790 in main ()
(gdb) bt
#0  0x000055d24a4cd790 in main ()
(gdb)

NOTE: if you don't see (core dumped) when running the process that means that a core dump was not generated (which leaves you with the old one). If you are using Bash, try running the command ulimit -c unlimited before crashing the program.

Minimal Core Dump (Stack Trace + Current Frame Only)