Objdump and Resolving Linkage of Local Function Calls

objdump and resolving linkage of local function calls?

Is there a way, without actually linking, to get an assembly listing that has the function names resolved?

Yes: use objdump -dr foo.o

How to disassemble one single function using objdump?

I would suggest using gdb as the simplest approach. You can even do it as a one-liner, like:

gdb -batch -ex 'file /bin/ls' -ex 'disassemble main'

Why doesn't gcc reference the PLT for function calls?

You're not building with symbol-interposition enabled (a side-effect of -fPIC), so the call destination address can potentially be resolved at link time to an address in another object file that is being statically linked into the same executable. (e.g. gcc foo.o bar.o).

However, if the symbol is only found in a library that you're dynamically linking to (gcc foo.o -lbar), the call has to be indirected through the PLT to support.

Now this is the tricky part: without -fPIC or -fPIE, gcc still emits asm that calls the function directly:

int puts(const char*);         // puts exists in libc, so we can link this example
void call_puts(void) { puts("foo"); }

# gcc 5.3 -O3 (without -fPIC)
movl $.LC0, %edi # absolute 32bit addressing: slightly smaller code, because static data is known to be in the low 2GB, in the default "small" code model
jmp puts # tail-call optimization. Same as call puts/ret, except for stack alignment

But if you look at the linked binary:
(on this Godbolt compiler explorer link, click the "binary" button to toggle between gcc -S asm output and objdump -dr disassembly)

    # disassembled linker output
mov $0x400654,%edi
jmpq 400490 <puts@plt>

During linking, the call to puts was "magically" replaced with indirection through puts@plt, and a puts@plt definition is present in the linked executable.

I don't know the details of how this works, but it's done at link time when linking to a shared library. Crucially, it doesn't require anything in the header files to mark the function prototype as being in a shared library. You get the same results from including <stdio.h> as you do from declaring puts yourself. (This is highly not recommended; it's probably legal for a C implementation to only work properly with the declarations in headers. It happens to work on Linux, though.)


When compiling a position-independent executable (with -fPIE), the linked binary jumps to puts through the PLT, identically to without -fPIC. However, the compiler asm output is different (try it yourself on the godbolt link above):

call_puts:  # compiled with -fPIE
leaq .LC0(%rip), %rdi # RIP-relative addressing for static data
jmp puts@PLT

The compiler forces indirection through the PLT for any calls to functions it can't see the definition for. I don't understand why. In PIE mode, we're compiling code for an executable, not a shared library. The linker should be able to link multiple object files into a position-independent executable with direct calls between functions defined in the executable. I'm testing on Linux (my desktop and godbolt), not OS X, where I assume gcc -fPIE is the default. It might be configured differently, IDK.


With -fPIC instead of -fPIE, things are even worse: even calls to global functions defined within the same compilation unit have to go through the PLT, to support symbol interposition. (e.g. LD_PRELOAD=intercept_some_functions.so ./a.out)

The differences between -fPIC and -fPIE are mainly that PIE can assume no symbol interposition for functions in the same compilation unit, but PIC can't. OS X requires position-independent executables, as well as shared libraries, but there is a difference in what the compiler can do when making code for a library vs. making code for an executable.

This Godbolt example has some more functions that demonstrate stuff about PIC and PIE mode, e.g. that call_puts() can't inline into another function in PIC mode, only PIE.

See also: Shared object in Linux without symbol interposition, -fno-semantic-interposition error.


puzzled by mov $0, %edi

You're looking at disassembly output from the .o, where addresses are just placeholder 0s that will be replaced by the linker at link time, based on the relocation information in the ELF object file. That's why @Leandros suggested objdump -r.

Similarly, the relative displacement in the call machine code is all-zeros, because the linker hasn't filled it in yet.

Procedure Linkage Table and Call Relative

A little of this is going off of memory, but let's see if I can't help you out...

As to your first question, there's a chain of things that link together. I can't guarantee this is how these tools are doing things, but just to show that there is a way.

  1. The PLT has a 1-to-1 correspondence (except for PLT[0], which is special) with a .rel(a).plt section. This section contains relocations for the PLT entries.
  2. Each .rel(a).plt entry has an info field which has a symbol table index, e.g. into .dynsym.
  3. Each symbol table entry has an offset into the string table (e.g. .dynstr) for its name. This offset is a byte offset starting from the beginning of the string section.

So as you can see, you can follow the PLT to the rel(a).plt, to the symbol table, to the string table, where you'll find "printf."

To answer your second question, take a look at the program headers (readelf -Wl <program>), and you'll see the virtual addresses for the different sections. That's where that address range comes from.

How to disassemble the main function of a stripped application?

Ok, here a big edition of my previous answer. I think I found a way now.

You (still :) have this specific problem:

(gdb) disas main
No symbol table is loaded. Use the "file" command.

Now, if you compile the code (I added a return 0 at the end), you will get with gcc -S:

    pushq   %rbp
movq %rsp, %rbp
movl $.LC0, %edi
call puts
movl $0, %eax
leave
ret

Now, you can see that your binary gives you some info:

Striped:

(gdb) info files
Symbols from "/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip".
Local exec file:
`/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip', file type elf64-x86-64.
Entry point: 0x400440
0x0000000000400238 - 0x0000000000400254 is .interp
...
0x00000000004003a8 - 0x00000000004003c0 is .rela.dyn
0x00000000004003c0 - 0x00000000004003f0 is .rela.plt
0x00000000004003f0 - 0x0000000000400408 is .init
0x0000000000400408 - 0x0000000000400438 is .plt
0x0000000000400440 - 0x0000000000400618 is .text
...
0x0000000000601010 - 0x0000000000601020 is .data
0x0000000000601020 - 0x0000000000601030 is .bss

The most important entry here is .text. It is a common name for a assembly start of code, and from our explanation of main bellow, from its size, you can see that it includes main. If you disassembly it, you will see a call to __libc_start_main. Most important, you are disassembling a good entry point that is real code (you are not misleading to change DATA to CODE).

disas 0x0000000000400440,0x0000000000400618
Dump of assembler code from 0x400440 to 0x400618:
0x0000000000400440: xor %ebp,%ebp
0x0000000000400442: mov %rdx,%r9
0x0000000000400445: pop %rsi
0x0000000000400446: mov %rsp,%rdx
0x0000000000400449: and $0xfffffffffffffff0,%rsp
0x000000000040044d: push %rax
0x000000000040044e: push %rsp
0x000000000040044f: mov $0x400540,%r8
0x0000000000400456: mov $0x400550,%rcx
0x000000000040045d: mov $0x400524,%rdi
0x0000000000400464: callq 0x400428 <__libc_start_main@plt>
0x0000000000400469: hlt
...

0x000000000040046c: sub $0x8,%rsp
...
0x0000000000400482: retq
0x0000000000400483: nop
...
0x0000000000400490: push %rbp
..
0x00000000004004f2: leaveq
0x00000000004004f3: retq
0x00000000004004f4: data32 data32 nopw %cs:0x0(%rax,%rax,1)
...
0x000000000040051d: leaveq
0x000000000040051e: jmpq *%rax
...
0x0000000000400520: leaveq
0x0000000000400521: retq
0x0000000000400522: nop
0x0000000000400523: nop
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: mov $0x40062c,%edi
0x000000000040052d: callq 0x400418 <puts@plt>
0x0000000000400532: mov $0x0,%eax
0x0000000000400537: leaveq
0x0000000000400538: retq

The call to __libc_start_main gets as its first argument a pointer to main(). So, the last argument in the stack just immediately before the call is your main() address.

   0x000000000040045d:  mov    $0x400524,%rdi
0x0000000000400464: callq 0x400428 <__libc_start_main@plt>

Here it is 0x400524 (as we already know). Now you set a breakpoint an try this:

(gdb) break *0x400524
Breakpoint 1 at 0x400524
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2

Breakpoint 1, 0x0000000000400524 in main ()
(gdb) n
Single stepping until exit from function main,
which has no line number information.
hello 1
__libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>,
init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>,
stack_end=0x7fffffffdc38) at libc-start.c:258
258 libc-start.c: No such file or directory.
in libc-start.c
(gdb) n

Program exited normally.
(gdb)

Now you can disassembly it using:

(gdb) disas 0x0000000000400524,0x0000000000400600
Dump of assembler code from 0x400524 to 0x400600:
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: sub $0x10,%rsp
0x000000000040052c: movl $0x1,-0x4(%rbp)
0x0000000000400533: mov $0x40064c,%eax
0x0000000000400538: mov -0x4(%rbp),%edx
0x000000000040053b: mov %edx,%esi
0x000000000040053d: mov %rax,%rdi
0x0000000000400540: mov $0x0,%eax
0x0000000000400545: callq 0x400418 <printf@plt>
0x000000000040054a: mov $0x0,%eax
0x000000000040054f: leaveq
0x0000000000400550: retq
0x0000000000400551: nop
0x0000000000400552: nop
0x0000000000400553: nop
0x0000000000400554: nop
0x0000000000400555: nop
...

This is primarily the solution.

BTW, this is a different code, to see if it works. That is why the assembly above is a bit different. The code above is from this c file:

#include <stdio.h>

int main(void)
{
int i=1;
printf("hello %d\n", i);
return 0;
}

But!


if this does not work, then you still have some hints:

You should be looking to set breakpoints in the beginning of all functions from now on. They are just before a ret or leave. The first entry point is .text itself. This is the assembly start, but not the main.

The problem is that not always a breakpoint will let your program run. Like this one in the very .text:

(gdb) break *0x0000000000400440
Breakpoint 2 at 0x400440
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2

Breakpoint 2, 0x0000000000400440 in _start ()
(gdb) n
Single stepping until exit from function _start,
which has no line number information.
0x0000000000400428 in __libc_start_main@plt ()
(gdb) n
Single stepping until exit from function __libc_start_main@plt,
which has no line number information.
0x0000000000400408 in ?? ()
(gdb) n
Cannot find bounds of current function

So you need to keep trying until you find your way, setting breakpoints at:

0x400440
0x40046c
0x400490
0x4004f4
0x40051e
0x400524

From the other answer, we should keep this info:

In the non-striped version of the file, we see:

(gdb) disas main
Dump of assembler code for function main:
0x0000000000400524 <+0>: push %rbp
0x0000000000400525 <+1>: mov %rsp,%rbp
0x0000000000400528 <+4>: mov $0x40062c,%edi
0x000000000040052d <+9>: callq 0x400418 <puts@plt>
0x0000000000400532 <+14>: mov $0x0,%eax
0x0000000000400537 <+19>: leaveq
0x0000000000400538 <+20>: retq
End of assembler dump.

Now we know that main is at 0x0000000000400524,0x0000000000400539. If we use the same offset to look at the striped binary we get the same results:

(gdb) disas 0x0000000000400524,0x0000000000400539
Dump of assembler code from 0x400524 to 0x400539:
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: mov $0x40062c,%edi
0x000000000040052d: callq 0x400418 <puts@plt>
0x0000000000400532: mov $0x0,%eax
0x0000000000400537: leaveq
0x0000000000400538: retq
End of assembler dump.

So, unless you can get some tip where the main starts (like using another code with symbols), another way is if you can have some info about the firsts assembly instructions, so you can disassembly at specifics places and look if it matches. If you have no access at all to the code, you still can read the ELF definition to understand how many sections should appear in the code and try a calculated address. Still, you need info about sections in the code!

That is hard work, my friend! Good luck!

Beco



Related Topics



Leave a reply



Submit