Help with Understanding a Very Basic Main() Disassembly in Gdb

Help with understanding a very basic main() disassembly in GDB

Stack frames

The code at the beginning of the function body:

push  %ebp
mov %esp, %ebp

is to create the so-called stack frame, which is a "solid ground" for referencing parameters and objects local to the procedure. The %ebp register is used (as its name indicates) as a base pointer, which points to the base (or bottom) of the local stack inside the procedure.

After entering the procedure, the stack pointer register (%esp) points to the return address stored on the stack by the call instruction (it is the address of the instruction just after the call). If you'd just invoke ret now, this address would be popped from the stack into the %eip (instruction pointer) and the code would execute further from that address (of the next instruction after the call). But we don't return yet, do we? ;-)

You then push %ebp register to save its previous value somewhere and not lose it, because you'll use it for something shortly. (BTW, it usually contains the base pointer of the caller function, and when you peek that value, you'll find a previously stored %ebp, which would be again a base pointer of the function one level higher, so you can trace the call stack that way.) When you save the %ebp, you can then store the current %esp (stack pointer) there, so that %ebp will point to the same address: the base of the current local stack. The %esp will move back and forth inside the procedure when you'll be pushing and popping values on the stack or reserving & freeing local variables. But %ebp will stay fixed, still pointing to the base of the local stack frame.

Accessing parameters

Parameters passed to the procedure by the caller are "burried just uner the ground" (that is, they have positive offsets relative to the base, because stack grows down). You have in %ebp the address of the base of the local stack, where lies the previous value of the %ebp. Below it (that is, at 4(%ebp) lies the return address. So the first parameter will be at 8(%ebp), the second at 12(%ebp) and so on.

Local variables

And local variables could be allocated on the stack above the base (that is, they'd have negative offsets relative to the base). Just subtract N to the %esp and you've just allocated N bytes on the stack for local variables, by moving the top of the stack above (or, precisely, below) this region :-) You can refer to this area by negative offsets relative to %ebp, i.e. -4(%ebp) is the first word, -8(%ebp) is second etc. Remember that (%ebp) points to the base of the local stack, where the previous %ebp value has been saved. So remember to restore the stack to the previous position before you try to restore the %ebp through pop %ebp at the end of the procedure. You can do it two ways:

1. You can free only the local variables by adding back the N to the %esp (stack pointer), that is, moving the top of the stack as if these local variables had never been there. (Well, their values will stay on the stack, but they'll be considered "freed" and could be overwritten by subsequent pushes, so it's no longer safe to refer them. They're dead bodies ;-J )

2. You can flush the stack down to the ground and free all local space by simply restoring the %esp from the %ebp which has been fixed earlier to the base of the stack. It'll restore the stack pointer to the state it has just after entering the procedure and saving the %esp into %ebp. It's like loading the previously saved game when you've messed something ;-)

Turning off frame pointers

It's possible to have a less messy assembly from gcc -S by adding a switch -fomit-frame-pointer. It tells GCC to not assemble any code for setting/resetting the stack frame until it's really needed for something. Just remember that it can confuse debuggers, because they usually depend on the stack frame being there to be able to track up the call stack. But it won't break anything if you don't need to debug this binary. It's perfectly fine for release targets and it saves some spacetime.

Call Frame Information

Sometimes you can meet some strange assembler directives starting from .cfi interleaved with the function header. This is a so-called Call Frame Information. It's used by debuggers to track the function calls. But it's also used for exception handling in high-level languages, which needs stack unwinding and other call-stack-based manipulations. You can turn it off too in your assembly, by adding a switch -fno-dwarf2-cfi-asm. This tells the GCC to use plain old labels instead of those strange .cfi directives, and it adds a special data structures at the end of your assembly, refering to those labels. This doesn't turn off the CFI, just changes the format to more "transparent" one: the CFI tables are then visible to the programmer.

GDB disassemble for a simple program

  1. belongs to the (function-)prologue, it is aligning the SP to a 16-byte boundary, by bitmasking the SP.

  2. memory for the stack-frame is created, as your pointer needs to be passed to the function. The address will be passed from the stack to the function. Yet it seems that the expression is evluated at compile-time, so no need for the actual call.

  3. 0x8048520 is probably the adress of your string "%d". It is being put into eax, from there on it is put on the stack using the stackpointer.

There is plenty of material around, like this.

How to disassemble a memory range with GDB?

Do you only want to disassemble your actual main? If so try this:

(gdb) info line main 
(gdb) disas STARTADDRESS ENDADDRESS

Like so:

USER@MACHINE /cygdrive/c/prog/dsa
$ gcc-3.exe -g main.c

USER@MACHINE /cygdrive/c/prog/dsa
$ gdb a.exe
GNU gdb 6.8.0.20080328-cvs (cygwin-special)
...
(gdb) info line main
Line 3 of "main.c" starts at address 0x401050 <main> and ends at 0x401075 <main+
(gdb) disas 0x401050 0x401075
Dump of assembler code from 0x401050 to 0x401075:
0x00401050 <main+0>: push %ebp
0x00401051 <main+1>: mov %esp,%ebp
0x00401053 <main+3>: sub $0x18,%esp
0x00401056 <main+6>: and $0xfffffff0,%esp
0x00401059 <main+9>: mov $0x0,%eax
0x0040105e <main+14>: add $0xf,%eax
0x00401061 <main+17>: add $0xf,%eax
0x00401064 <main+20>: shr $0x4,%eax
0x00401067 <main+23>: shl $0x4,%eax
0x0040106a <main+26>: mov %eax,-0xc(%ebp)
0x0040106d <main+29>: mov -0xc(%ebp),%eax
0x00401070 <main+32>: call 0x4010c4 <_alloca>
End of assembler dump.

I don't see your system interrupt call however. (its been a while since I last tried to make a system call in assembly. INT 21h though, last I recall

How to disassemble the main function of a stripped application?

Ok, here a big edition of my previous answer. I think I found a way now.

You (still :) have this specific problem:

(gdb) disas main
No symbol table is loaded. Use the "file" command.

Now, if you compile the code (I added a return 0 at the end), you will get with gcc -S:

    pushq   %rbp
movq %rsp, %rbp
movl $.LC0, %edi
call puts
movl $0, %eax
leave
ret

Now, you can see that your binary gives you some info:

Striped:

(gdb) info files
Symbols from "/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip".
Local exec file:
`/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip', file type elf64-x86-64.
Entry point: 0x400440
0x0000000000400238 - 0x0000000000400254 is .interp
...
0x00000000004003a8 - 0x00000000004003c0 is .rela.dyn
0x00000000004003c0 - 0x00000000004003f0 is .rela.plt
0x00000000004003f0 - 0x0000000000400408 is .init
0x0000000000400408 - 0x0000000000400438 is .plt
0x0000000000400440 - 0x0000000000400618 is .text
...
0x0000000000601010 - 0x0000000000601020 is .data
0x0000000000601020 - 0x0000000000601030 is .bss

The most important entry here is .text. It is a common name for a assembly start of code, and from our explanation of main bellow, from its size, you can see that it includes main. If you disassembly it, you will see a call to __libc_start_main. Most important, you are disassembling a good entry point that is real code (you are not misleading to change DATA to CODE).

disas 0x0000000000400440,0x0000000000400618
Dump of assembler code from 0x400440 to 0x400618:
0x0000000000400440: xor %ebp,%ebp
0x0000000000400442: mov %rdx,%r9
0x0000000000400445: pop %rsi
0x0000000000400446: mov %rsp,%rdx
0x0000000000400449: and $0xfffffffffffffff0,%rsp
0x000000000040044d: push %rax
0x000000000040044e: push %rsp
0x000000000040044f: mov $0x400540,%r8
0x0000000000400456: mov $0x400550,%rcx
0x000000000040045d: mov $0x400524,%rdi
0x0000000000400464: callq 0x400428 <__libc_start_main@plt>
0x0000000000400469: hlt
...

0x000000000040046c: sub $0x8,%rsp
...
0x0000000000400482: retq
0x0000000000400483: nop
...
0x0000000000400490: push %rbp
..
0x00000000004004f2: leaveq
0x00000000004004f3: retq
0x00000000004004f4: data32 data32 nopw %cs:0x0(%rax,%rax,1)
...
0x000000000040051d: leaveq
0x000000000040051e: jmpq *%rax
...
0x0000000000400520: leaveq
0x0000000000400521: retq
0x0000000000400522: nop
0x0000000000400523: nop
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: mov $0x40062c,%edi
0x000000000040052d: callq 0x400418 <puts@plt>
0x0000000000400532: mov $0x0,%eax
0x0000000000400537: leaveq
0x0000000000400538: retq

The call to __libc_start_main gets as its first argument a pointer to main(). So, the last argument in the stack just immediately before the call is your main() address.

   0x000000000040045d:  mov    $0x400524,%rdi
0x0000000000400464: callq 0x400428 <__libc_start_main@plt>

Here it is 0x400524 (as we already know). Now you set a breakpoint an try this:

(gdb) break *0x400524
Breakpoint 1 at 0x400524
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2

Breakpoint 1, 0x0000000000400524 in main ()
(gdb) n
Single stepping until exit from function main,
which has no line number information.
hello 1
__libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>,
init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>,
stack_end=0x7fffffffdc38) at libc-start.c:258
258 libc-start.c: No such file or directory.
in libc-start.c
(gdb) n

Program exited normally.
(gdb)

Now you can disassembly it using:

(gdb) disas 0x0000000000400524,0x0000000000400600
Dump of assembler code from 0x400524 to 0x400600:
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: sub $0x10,%rsp
0x000000000040052c: movl $0x1,-0x4(%rbp)
0x0000000000400533: mov $0x40064c,%eax
0x0000000000400538: mov -0x4(%rbp),%edx
0x000000000040053b: mov %edx,%esi
0x000000000040053d: mov %rax,%rdi
0x0000000000400540: mov $0x0,%eax
0x0000000000400545: callq 0x400418 <printf@plt>
0x000000000040054a: mov $0x0,%eax
0x000000000040054f: leaveq
0x0000000000400550: retq
0x0000000000400551: nop
0x0000000000400552: nop
0x0000000000400553: nop
0x0000000000400554: nop
0x0000000000400555: nop
...

This is primarily the solution.

BTW, this is a different code, to see if it works. That is why the assembly above is a bit different. The code above is from this c file:

#include <stdio.h>

int main(void)
{
int i=1;
printf("hello %d\n", i);
return 0;
}

But!


if this does not work, then you still have some hints:

You should be looking to set breakpoints in the beginning of all functions from now on. They are just before a ret or leave. The first entry point is .text itself. This is the assembly start, but not the main.

The problem is that not always a breakpoint will let your program run. Like this one in the very .text:

(gdb) break *0x0000000000400440
Breakpoint 2 at 0x400440
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2

Breakpoint 2, 0x0000000000400440 in _start ()
(gdb) n
Single stepping until exit from function _start,
which has no line number information.
0x0000000000400428 in __libc_start_main@plt ()
(gdb) n
Single stepping until exit from function __libc_start_main@plt,
which has no line number information.
0x0000000000400408 in ?? ()
(gdb) n
Cannot find bounds of current function

So you need to keep trying until you find your way, setting breakpoints at:

0x400440
0x40046c
0x400490
0x4004f4
0x40051e
0x400524

From the other answer, we should keep this info:

In the non-striped version of the file, we see:

(gdb) disas main
Dump of assembler code for function main:
0x0000000000400524 <+0>: push %rbp
0x0000000000400525 <+1>: mov %rsp,%rbp
0x0000000000400528 <+4>: mov $0x40062c,%edi
0x000000000040052d <+9>: callq 0x400418 <puts@plt>
0x0000000000400532 <+14>: mov $0x0,%eax
0x0000000000400537 <+19>: leaveq
0x0000000000400538 <+20>: retq
End of assembler dump.

Now we know that main is at 0x0000000000400524,0x0000000000400539. If we use the same offset to look at the striped binary we get the same results:

(gdb) disas 0x0000000000400524,0x0000000000400539
Dump of assembler code from 0x400524 to 0x400539:
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: mov $0x40062c,%edi
0x000000000040052d: callq 0x400418 <puts@plt>
0x0000000000400532: mov $0x0,%eax
0x0000000000400537: leaveq
0x0000000000400538: retq
End of assembler dump.

So, unless you can get some tip where the main starts (like using another code with symbols), another way is if you can have some info about the firsts assembly instructions, so you can disassembly at specifics places and look if it matches. If you have no access at all to the code, you still can read the ELF definition to understand how many sections should appear in the code and try a calculated address. Still, you need info about sections in the code!

That is hard work, my friend! Good luck!

Beco

Understanding asm instructions in basic C program in GDB

  1. Yes, you are right. Register r11 is used as frame pointer. This frame pointer serves as a reference to where your local variables are stored on the stack. Note that the original frame pointer from the caller must be preserved (so it is saved and restored later).
  2. Almost. It happens one line later, it stores it on the stack at [r11 - 8].
    Remember that r11 is the frame pointer, everything is relative w/respect to that.
  3. It is not pushed on the stack. It is simply returned in register r0.
    It's common on a lot of platforms that a general purpose register is used. Then the stack need not to be used for simple and plain return values (like your integer). I guess this is for performance reasons as registers are faster than memory accesses.
  4. I don't know what you mean with flushed. What happens here is that the function sets things up the way it likes, and afterwards reverts those changes. The content of the stack might still contain values that the function used. It's just that the pointers are reset to their original locations. First at the beginning of the function the original frame pointer (r11) is saved/pushed on the stack.
    Then the value of the stack pointer becomes the new frame pointer.
    At the end of the function the stack pointer is returned to where it was (by overwriting it with r11) and finally r11 itself is restored too by popping it off the stack.

Analyzing gdb disassembly

It will probably be helpful to remember that on i386, function arguments are passed on the stack. On function entry, if you read the word of memory at the stack pointer's address, you'll find the caller's return address.

It looks like your mystery function here takes two arguments. So when it says

sub    $0x2c,%esp
mov 0x34(%esp),%eax

Once 0x2c has been subtracted from the stack pointer, we can find the caller's saved eip at *(esp + 0x2c), we can find the first argument at *(esp + 0x30) and we can find the second argument at *(esp + 0x34). You can see a reference to that second argument here,

movl   $0x804a819,0x4(%esp)
mov 0x30(%esp),%eax
mov %eax,(%esp)
call 0x80488d0 <__isoc99_sscanf@plt>

This stores the address of your format string at address (0x804a819) at *(esp+4) - so that is going to be the 2nd argument to sscanf(). Then it loads the first argument to your mystery function (at *(esp + 0x30)) and stores it at *(esp) - so it will be the 1st argument to sscanf().

Hopefully that's enough help to understand the function without being too helpful. :)

Where are pointers when we disassemble with gdb

The declarations char*a; and size_t r; don't do anything by themselves; they rather tell the compiler that you want to be able to use the identifiers a and r for storage of values with some lifetime limited to the duration of main's execution. On the other hand, most assembly instructions (except nops and such) do something.

If you stored and accessed values in these variables, or took their addresses and used those addresses, in a way that's not trivially equivalent to doing-nothing with them, then you would see the compiler emit code to make room (typically by adjusting the stack pointer, or pushing some registers to the stack to save their values so that there are extra free registers for your data) and to store/load the values.

Show current assembly instruction in GDB

You can switch to assembly layout in GDB:

(gdb) layout asm

See here for more information. The current assembly instruction will be shown in assembler window.

   ┌───────────────────────────────────────────────────────────────────────────┐
│0x7ffff740d756 <__libc_start_main+214> mov 0x39670b(%rip),%rax #│
│0x7ffff740d75d <__libc_start_main+221> mov 0x8(%rsp),%rsi │
│0x7ffff740d762 <__libc_start_main+226> mov 0x14(%rsp),%edi │
│0x7ffff740d766 <__libc_start_main+230> mov (%rax),%rdx │
│0x7ffff740d769 <__libc_start_main+233> callq *0x18(%rsp) │
>│0x7ffff740d76d <__libc_start_main+237> mov %eax,%edi │
│0x7ffff740d76f <__libc_start_main+239> callq 0x7ffff7427970 <exit> │
│0x7ffff740d774 <__libc_start_main+244> xor %edx,%edx │
│0x7ffff740d776 <__libc_start_main+246> jmpq 0x7ffff740d6b9 <__libc_start│
│0x7ffff740d77b <__libc_start_main+251> mov 0x39ca2e(%rip),%rax #│
│0x7ffff740d782 <__libc_start_main+258> ror $0x11,%rax │
│0x7ffff740d786 <__libc_start_main+262> xor %fs:0x30,%rax │
│0x7ffff740d78f <__libc_start_main+271> callq *%rax │
└───────────────────────────────────────────────────────────────────────────┘
multi-thre process 3718 In: __libc_start_main Line: ?? PC: 0x7ffff740d76d
#3 0x00007ffff7466eb5 in _IO_do_write () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007ffff74671ff in _IO_file_overflow ()
from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000408756 in ?? ()
#6 0x0000000000403980 in ?? ()
#7 0x00007ffff740d76d in __libc_start_main ()
from /lib/x86_64-linux-gnu/libc.so.6
(gdb)


Related Topics



Leave a reply



Submit