How to Disassemble a Binary Executable in Linux to Get the Assembly Code

How to disassemble a binary executable in Linux to get the assembly code?

I don't think gcc has a flag for it, since it's primarily a compiler, but another of the GNU development tools does. objdump takes a -d/--disassemble flag:

$ objdump -d /path/to/binary

The disassembly looks like this:

080483b4 <main>:
 80483b4:   8d 4c 24 04             lea    0x4(%esp),%ecx
 80483b8:   83 e4 f0                and    $0xfffffff0,%esp
 80483bb:   ff 71 fc                pushl  -0x4(%ecx)
 80483be:   55                      push   %ebp
 80483bf:   89 e5                   mov    %esp,%ebp
 80483c1:   51                      push   %ecx
 80483c2:   b8 00 00 00 00          mov    $0x0,%eax
 80483c7:   59                      pop    %ecx
 80483c8:   5d                      pop    %ebp
 80483c9:   8d 61 fc                lea    -0x4(%ecx),%esp
 80483cc:   c3                      ret    
 80483cd:   90                      nop
 80483ce:   90                      nop
 80483cf:   90                      nop

How to disassemble, modify and then reassemble a Linux executable?

I don't think there is any reliable way to do this. Machine code formats are very complicated, more complicated than assembly files. It isn't really possible to take a compiled binary (say, in ELF format) and produce a source assembly program which will compile to the same (or similar-enough) binary. To gain an understanding of the differences, compare the output of GCC compiling direct to assembler (gcc -S) versus the output of objdump on the executable (objdump -D).

There are two major complications I can think of. Firstly, the machine code itself is not a 1-to-1 correspondence with assembly code, because of things like pointer offsets.

For example, consider the C code to Hello world:

int main()
{
    printf("Hello, world!\n");
    return 0;
}

This compiles to the x86 assembly code:

.LC0:
    .string "hello"
    .text
<snip>
    movl    $.LC0, %eax
    movl    %eax, (%esp)
    call    printf

Where .LCO is a named constant, and printf is a symbol in a shared library symbol table. Compare to the output of objdump:

80483cd:       b8 b0 84 04 08          mov    $0x80484b0,%eax
80483d2:       89 04 24                mov    %eax,(%esp)
80483d5:       e8 1a ff ff ff          call   80482f4 <printf@plt>

Firstly, the constant .LC0 is now just some random offset in memory somewhere -- it would be difficult to create an assembly source file which contains this constant in the correct place, since the assembler and linker are free to choose locations for these constants.

Secondly, I'm not entirely sure about this (and it depends on things like position independent code), but I believe the reference to printf is not actually encoded at the pointer address in that code there at all, but the ELF headers contain a lookup table which dynamically replaces its address at runtime. Therefore, the disassembled code doesn't quite correspond to the source assembly code.

In summary, source assembly has symbols while compiled machine code has addresses which are difficult to reverse.

The second major complication is that an assembly source file can't contain all of the information that was present in the original ELF file headers, like which libraries to dynamically link against, and other metadata placed there by the original compiler. It would be difficult to reconstruct this.

Like I said, it's possible that a special tool can manipulate all of this information, but it is unlikely that one can simply produce assembly code which can be reassembled back to the executable.

If you are interested in modifying just a small section of the executable, I recommend a much more subtle approach than recompiling the whole application. Use objdump to get the assembly code for the function(s) you are interested in. Convert it to "source assembly syntax" by hand (and here, I wish there was a tool that actually produced disassembly in the same syntax as the input), and modify it as you wish. When you are done, recompile just those function(s) and use objdump to figure out the machine code for your modified program. Then, use a hex editor to manually paste the new machine code over the top of the corresponding part of the original program, taking care that your new code is precisely the same number of bytes as the old code (or all the offsets would be wrong). If the new code is shorter, you can pad it out using NOP instructions. If it is longer, you may be in trouble, and might have to create new functions and call them instead.

Disassemble, modify, and reassemble executable

There is no reliable way to do this with normal assembler syntax. See How to disassemble, modify and then reassemble a Linux executable?. Section info is typically not faithfully disassembled, so you'd need a special format designed for modify and reassembling + relinking.

Also, instruction-lengths are a problem when code only works when padded by using longer encodings. (e.g. in a table of jump targets for a computed goto). See Where are GNU assembler instruction suffixes like ".s" in x86 "mov.s" documented?, but note that disassemblers don't support disassembling into that format.

ndisasm doesn't understand object file formats, so it disassembles headers as machine code!

For this to have any hope of working, use a disassembler like Agner Fog's objconv which will output asm source (NASM, MASM, or GAS AT&T) which does assemble. It might not actually work if any of the code depended on a specific longer-than-default encoding.

I'm not sure how faithful objconv is with respect to emitting section .bss, section .rodata and other directives like that to place data where it found it in the object file, but that's what you need.

Re: absolute relocations: make sure you put DEFAULT REL at the top of your file. I forget if objconv does this by default. x86-64 Mach-o only supports PC-relative relocations, so you have to create position-independent code (e.g. using RIP-relative addressing modes).

ndisasm doesn't read the symbol table, so all its operands use absolute addressing. objconv makes up label names for jump targets and static data that doesn't appear in the symbol table.

Disassembling A Flat Binary File Using objdump

I found the solution to my own question on a different forum. It looks something like this:

objdump -b binary --adjust-vma=0xabcd1000 -D file.bin

I've tested this and it works.

how to see the assembly of libc functions in an elf

You can compile with

~$ gcc -static prog.c

while prog.c uses the functions you the assembly of.
That will statically link the libraries used to the binary.

Then you can just:

~$ objdump --disassemble a.out

EDIT

You can even take a simpler way:
just objdump the libc library:

~$ objdump --disassemble /usr/lib/libc.so.6 // or whatever the path of libc is

How do I disassemble raw 16-bit x86 machine code?

You can use objdump. According to this article the syntax is:

objdump -D -b binary -mi386 -Maddr16,data16 mbr

how to get flat binary code address on bare metal?

You can get NASM to make a "listing" when it assembles: nasm -l foo.lst foo.asm (with the default flat-binary output mode, and default output filename of foo). Or have it write the listing to stdout with nasm -l /dev/stdout foo.asm | less if you just want to look at it on the fly.

But unfortunately the output doesn't respect the org directive, it's still relative to the image base:

     1                                  org 0x7c00
     2                                  
     3 00000000 31C0                    xor ax,ax
     4 00000002 8ED8                    mov ds, ax
     5                                  
     6 00000004 686869                  push "hi"
     7 00000007 C706[0D00]7879          mov word [var], 'xy'
     8                                  
     9 0000000D 68656C6C6F              var: db "hello"

Or as @MichaelPetch suggested in comments:

Personally I would be generating an ELF version of the kernel file and a binary version. The ELF version can contain the debug information while the binary version will be what is executed.
I'd stop using binary as the output type of a linker script. Just have it generate ELF executable and then convert the ELF executable to a binary file with objcopy. The binary file runs on the remote machine, the ELF file is used in the debugger.
The ELF file can be used by the debugger for symbolic information which is the easiest thing to debug with.
I will say that GDB and QEMU are tricky to debug 16-bit real mode code with since GDB has no real understanding of segment:offset addressing in real mode.

BOCHS has a built-in debugger which does understand segmentation, and has built-in commands for parsing the IDT and GDT instead of just dumping raw bytes.

Michael has recommended BOCHS in the past for debugging bootloaders that switch to protected and/or long mode.

How to disassemble fasm-generated binary?

You can easly using radare2, using pdf command that means disassemble :

% cat test.asm 
format ELF64 executable
sys_exit = 60
entry $
  mov rax, sys_exit
  xor rdi, rdi
  syscall
% ./fasm test.asm
flat assembler  version 1.73.04  (16384 kilobytes memory) 1 passes, 132 bytes.
% file test
test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, stripped
% r2 -AA test
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for objc references
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
[x] Finding function preludes
[x] Enable constraint types analysis for variables
-- In visual mode press 'c' to toggle the cursor mode. Use tab to navigate
[0x00400078]> pdf
        ;-- segment.LOAD0:
        ;-- rip:
┌ 12: entry0 ();
│           0x00400078      48c7c03c0000.  mov rax, 0x3c               ; '<' ; 60 ; [00] -rwx segment size 12 named LOAD0
│           0x0040007f      4831ff         xor rdi, rdi
└           0x00400082      0f05           syscall
[0x00400078]>

How to Disassemble a Binary Executable in Linux to Get the Assembly Code