Minimal core dump (stack trace + current frame only)
I have "solved" this issue in two ways:
- I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).
I stripped the debug info and put it in a separate file, which I will store somewhere safe, usingstrip
; from there, I will useadd22line
with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes. - Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer
I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump()
function in the kernel).
For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:
- see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
- try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).
Dumping only stack trace in linux core dumps
You can set /proc/$PID/coredump_filter
to 0x10
.
See http://man7.org/linux/man-pages/man5/core.5.html
Analyzing core dump with stack corrupted
If frame 1 does not make sense at a source level, you might try looking at disassembly of frame 1. After selecting that frame, disass $pc
should show you the disassembly for the entire function, with =>
to indicate the return address (the instruction immediately after the call to frame 0).
In the case of a null function pointer dereference, the instruction for the call to frame 0 might involve a simple register dereference, in which case you'd want to understand how that register obtained the null value. In some cases including /m
in a disass
command can be helpful, although it can cause confusion because of the distinction between instruction boundaries and source line boundaries. Omitting /m
is more likely to display a meaningful return address.
The =>
in the updated disassembly (without /m
) makes sense. In any frame aside from frame 0, the pc
value (what the =>
points at in the disassembly) indicates the instruction which will execute when the next lowest numbered frame returns (which, due to the crash, did not occur in this case). The pc
value in frame 1 is not the value of the pc
register at the time of the crash, but rather the saved pc
value pushed on the stack by the call
instruction. One way to see that is to compare output from x/a $sp
in frame 0 to x/i $pc
in frame 1.
One way to interpret this disassembly is that edx
is some object, and [edx+0x14]
points into its vtable. One way the vtable might wind up with a null pointer is a memory allocation issue with a stale reference to a chunk of memory which has been deallocated and subsequently overwritten by its rightful owner (the next piece of code to allocate that chunk). If any of that is applicable here, it can work either way (the code in frame 1 might be the culprit, or it might be the victim). There are other reasons memory might be overwritten with incorrect contents, but double allocation might be a good place to start.
It probably makes sense to examine the contents of the object referenced by edx
in frame 1, to see if there are any other anomalies besides what could be an incorrect vtable. Both the print
command and the x
command (within gdb) can be useful for this. My best guess about which object is referenced by edx
, based on disass/m
output (at this writing, visible only in the edit history of the question), is _listener
, but it would be best to confirm that by further study of the disassembly (the excerpt available here does not seem to include the instruction that determines the value of edx
).
What does ?? in gdb backtrace mean and how to get the actual stack frames?
Those ??
are usually where the name of the function is displayed. GDB does not know the name of those functions and therefore displays ??
.
Now, why is this happening? Depends. GCC compiles including symbols (e.g. function names and similar) by default. Most probably you are working with a stripped version, where symbols have been removed, or just with the wrong file.
As @zwol suggests, the line you see warning: exec file is newer than core file
is an indication of the fact that something else is going on that you don't show in your question. You are working on a core
dump file generated by the crashed executable, which is outdated.
I would suggest you to re-compile the program from scratch and make sure that you are opening the right file with GDB. First produce a new core
dump by crashing the new program, then open it in GDB.
Assuming the following program.c
:
int main(void) { return 1/0; }
This should work:
$ rm -f core
$ gcc program.c -o program
$ ./program
Floating point exception (core dumped)
$ gdb program core
Reading symbols from program...(no debugging symbols found)...done.
[New LWP 11896]
Core was generated by `./program'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0 0x000055d24a4cd790 in main ()
(gdb) bt
#0 0x000055d24a4cd790 in main ()
(gdb)
NOTE: if you don't see (core dumped)
when running the process that means that a core dump was not generated (which leaves you with the old one). If you are using Bash, try running the command ulimit -c unlimited
before crashing the program.
Related Topics
Should Linux Cron Jobs Be Specified with an "&" to Indicate to Run in Background
How to Untar a Tar.Bz File in Unix
Use Grep to Find Content in Files and Move Them If They Match
Using Rsync Include and Exclude Options to Include Directory and File by Pattern
Read Line by Line in Bash Script
How to Get the First Column of Comm Output
Cmake Doesn't Know Where Is Qt4 Qmake
Bash: Add String to the End of the File Without Line Break
How to Make Rpm Auto Install Dependencies
How to List the Size of Each File and Directory and Sort by Descending Size in Bash
Exclude .Svn Directories from Grep
Why Does This Code Crash with Address Randomization On
Doesn't Sh Support Process Substitution <(...)
Perf_Event_Open Overflow Signal
Sort Logfile by Timestamp on Linux Command Line
Run a Shell Script from Docker-Compose Command, Inside the Container