Understanding C pointers using GDB by examining core and call stack
When a function is called in C, the parameters are copied into registers or pushed onto the stack. The called function can reuse those registers and stack locations for any purpose. Often, but not always, a parameter is kept in the same register or stack location for the entire lifetime of the function call.
On a 32-bit system, the first argument to a function - in your case, inp
- is often located on the stack, 12 bytes away from the location that the stack frame's base pointer points to. See stackoverflow.com: what exactly is program stack's growth direction.
When gdb
does a backtrace, the only guidance it has from the compiler is something like "the first argument to func2
is named inp
and is a 4-byte value of type *UTYPE
located at a 12-byte offset from the %ebp register".
If, somewhere in func2
, you alter inp
, as you do at location (4), then any backtrace from that point on may very well show the altered value of inp
, in your case, 0. The value that inp
had when func2
was entered is lost forever, unless the compiler has been clever enough to include guidance like "the first argument to func2
is named inp
and is a 4-byte value of type *UTYPE
and its value upon entry to func2
can be found by unwinding the stack to the previous frame and looking at the value of ptr
, which is located at a -4-byte offset from the %ebp register." The newer versions of the DWARF debugging format can specify things like this, I believe.
I cannot explain why your gdb
's backtrace shows ptr
in func1
's frame as having the value 0. Setting inp
to NULL should have no effect on ptr
's value nor on gdb's ability to show ptr
's value.
this pointer changes in GDB backtrace
The this
pointer can change between frames in a gdb trace if the function in the next frame is called on a different object (even if the objects are the same type), since this is for the specific instance. This is probably not your problem.
0x200
is not a valid value for this
, and almost certainly indicates memory corruption of some type. The this
pointer is sometimes stored on the stack and passed as an invisible first argument to a function. So if you have corrupted the stack (by going out of bounds writing to another variable) you could see the this pointer corrupted.
The value 0x200
itself is interesting. Because it is so close to 0
, but not actually 0
, it indicates that the instance you're looking at is probably part of another object or array, located 0x200
bytes from the beginning of that object/array, and that the object/array's address is actually NULL
. Looking at your code you should be able to pretty easily figure out which object has gotten set to NULL
, which is causing this to report 0x200
.
How gdb reconstructs stacktrace for C++?
Speaking Pseudocode, you could call the stack "an array of packed stack frames", where every stack frame is a data structure of variable size you could express like:
template struct stackframe<N> {
uintptr_t contents[N];
#ifndef OMIT_FRAME_POINTER
struct stackframe<> *nextfp;
#endif
void *retaddr;
};
Problem is that every function has a different <N>
- frame sizes vary.
The compiler knows frame sizes, and if creating debugging information will usually emit these as part of that. All the debugger then needs to do is to locate the last program counter, look up the function in the symbol table, then use that name to look up the framesize in the debugging information. Add that to the stackpointer and you get to the beginning of the next frame.
If using this method you don't require frame linkage, and backtracing will work just fine even if you use -fomit-frame-pointer
. On the other hand, if you have frame linkage, then iterating the stack is just following a linked list - because every framepointer in a new stackframe is initialized by the function prologue code to point to the previous one.
If you have neither frame size information nor framepointers, but still a symbol table, then you can also perform backtracing by a bit of reverse engineering to calculate the framesizes from the actual binary. Start with the program counter, look up the function it belongs to in the symbol table, and then disassemble the function from the start. Isolate all operations between the beginning of the function and the program counter that actually modify the stackpointer (write anything to the stack and/or allocate stackspace). That calculates the frame size for the current function, so subtract that from the stackpointer, and you should (on most architectures) find the last word written to the stack before the function was entered - which is usually the return address into the caller. Re-iterate as necessary.
Finally, you can perform a heuristic analysis of the contents of the stack - isolate all words in the stack that are within executably-mapped segments of the process address space (and thereby could be function offsets aka return addresses), and play a what-if game looking up the memory, disassembling the instruction there and see if it actually is a call instruction of sort, if so whether that really called the 'next' and if you can construct an uninterrupted call sequence from that. This works to a degree even if the binary is completely stripped (although all you could get in that case is a list of return addresses). I don't think GDB employs this technique, but some embedded lowlevel debuggers do. On x86, due to the varying instruction lengths, this is terribly difficult to do because you can't easily "step back" through an instruction stream, but on RISC, where instruction lengths are fixed, e.g. on ARM, this is much simpler.
There are some holes that make simple or even complex/exhaustive implementations of these algorithms fall over sometimes, like tail-recursive functions, inlined code, and so on. The gdb sourcecode might give you some more ideas:
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/frame.c
GDB employs a variety of such techniques.
How can one see content of stack with GDB?
info frame
to show the stack frame info
To read the memory at given addresses you should take a look at x
x/x $esp
for hex x/d $esp
for signed x/u $esp
for unsigned etc. x uses the format syntax, you could also take a look at the current instruction via x/i $eip
etc.
gdb backtrace of a core file prints error no such file or directory
This error looks mystifying but it is correct. It shows that a NULL pointer de-reference was being made by strcmp
, which was called from line 1144 of your code.
A segmentation fault refers to trying to access a page of memory that is invalid: its segment is mapped as Invalid for read or write in the MMU. In this case, strcmp
is trying to access page 0 because you passed it a NULL ptr. Null Ptr is address zero, and page 0 is an invalid page.
The reference to file:
../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S
is referring the the assembler file (.S) that implements strcmp for x86 on 64-bit architectures. Since you do not have that implementation file on your system, gdb is complaining that it can not access it.
Determine the line of code that causes a segmentation fault?
GCC can't do that but GDB (a debugger) sure can. Compile you program using the -g
switch, like this:
gcc program.c -g
Then use gdb:
$ gdb ./a.out
(gdb) run
<segfault happens here>
(gdb) backtrace
<offending code is shown here>
Here is a nice tutorial to get you started with GDB.
Where the segfault occurs is generally only a clue as to where "the mistake which causes" it is in the code. The given location is not necessarily where the problem resides.
What does ?? in gdb backtrace mean and how to get the actual stack frames?
Those ??
are usually where the name of the function is displayed. GDB does not know the name of those functions and therefore displays ??
.
Now, why is this happening? Depends. GCC compiles including symbols (e.g. function names and similar) by default. Most probably you are working with a stripped version, where symbols have been removed, or just with the wrong file.
As @zwol suggests, the line you see warning: exec file is newer than core file
is an indication of the fact that something else is going on that you don't show in your question. You are working on a core
dump file generated by the crashed executable, which is outdated.
I would suggest you to re-compile the program from scratch and make sure that you are opening the right file with GDB. First produce a new core
dump by crashing the new program, then open it in GDB.
Assuming the following program.c
:
int main(void) { return 1/0; }
This should work:
$ rm -f core
$ gcc program.c -o program
$ ./program
Floating point exception (core dumped)
$ gdb program core
Reading symbols from program...(no debugging symbols found)...done.
[New LWP 11896]
Core was generated by `./program'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0 0x000055d24a4cd790 in main ()
(gdb) bt
#0 0x000055d24a4cd790 in main ()
(gdb)
NOTE: if you don't see (core dumped)
when running the process that means that a core dump was not generated (which leaves you with the old one). If you are using Bash, try running the command ulimit -c unlimited
before crashing the program.
Related Topics
How to Load a Bmp on Glut to Use It as a Texture
Is Casting Std::Pair<T1, T2> Const& to Std::Pair<T1 Const, T2> Const& Safe
Why Can't We Declare Object of a Class Inside the Same Class
How to Avoid Errors While Using Crtp
Why Is Iterating 2D Array Row Major Faster Than Column Major
Incomplete Class Usage in Template
Forward Declaration with Vector of Class Type - Pointer to Incomplete Class Type Not Allowed
Open File with Fopen, Given Absolute Path on Windows
Building Boost with Visual Studio 2013 (Express)
How to Program for Windows Phone 7 in Standard C++ Only
Conversion from Void* to the Pointer of the Base Class
How to Call a Pointer-To-Member-Function
How to Count Cameras in Opencv 2.3
How to Use Unordered_Set with Custom Types
How to Run the Preprocessor on Local Headers Only
Is There a Standard Date/Time Class in C++
How to Legally Reinterpret_Cast Between Layout-Compatible Standard-Layout Types