How to Disassemble Raw Mips Code

How do I disassemble raw MIPS code?

Hmm, it seems easier than that. -b elf32-tradlittlemips does not work because the file is not an ELF executable, but binary. So, the correct option to be used is -b binary. The other option, -mmips makes objdump recognize the file as binary for MIPS. Since the target machine is little endian, I also had to add -EL to make the output match the output for x.o.

-mmips only includes the basic instruction set. The AR7 has a MIPS32 processor which has more instructions than just mips. To decode these newer MIPS32 instructions, use -mmips:isa32. A list of available ISAs can be listed with objdump -i -m.

The final command becomes:

mipsel-linux-gnu-objdump -b binary -mmips:isa32 -EL -D vmlinux

This would show registers like $3 instead of their names. To adjust that, I used the next additional options which are mentioned in mipsel-linux-gnu-objdump --help:

-Mgpr-names=32,cp0-names=mips32,cp0-names=mips32,hwr-names=mips32,reg-names=mips32

I chose for mips32 after reading:

http://www.linux-mips.org/wiki/AR7
http://www.linux-mips.org/wiki/Instruction_Set_Architecture

How do I disassemble raw 16-bit x86 machine code?

You can use objdump. According to this article the syntax is:

objdump -D -b binary -mi386 -Maddr16,data16 mbr

Disassemble raw x64 machine code

And as usual writing down the question already gives you some rather good ideas what else to try..

Anyhow the right machine architecture is: i386:x86-64.

The full command is:

objdump -b binary -D -m i386:x86-64 <file>

If you want to disassemble code that expects to be loaded at a specific address, you can add the --adjust-vma <load-address> flag.

How to disassemble, modify and then reassemble a Linux executable?

I don't think there is any reliable way to do this. Machine code formats are very complicated, more complicated than assembly files. It isn't really possible to take a compiled binary (say, in ELF format) and produce a source assembly program which will compile to the same (or similar-enough) binary. To gain an understanding of the differences, compare the output of GCC compiling direct to assembler (gcc -S) versus the output of objdump on the executable (objdump -D).

There are two major complications I can think of. Firstly, the machine code itself is not a 1-to-1 correspondence with assembly code, because of things like pointer offsets.

For example, consider the C code to Hello world:

int main()
{
    printf("Hello, world!\n");
    return 0;
}

This compiles to the x86 assembly code:

.LC0:
    .string "hello"
    .text
<snip>
    movl    $.LC0, %eax
    movl    %eax, (%esp)
    call    printf

Where .LCO is a named constant, and printf is a symbol in a shared library symbol table. Compare to the output of objdump:

80483cd:       b8 b0 84 04 08          mov    $0x80484b0,%eax
80483d2:       89 04 24                mov    %eax,(%esp)
80483d5:       e8 1a ff ff ff          call   80482f4 <printf@plt>

Firstly, the constant .LC0 is now just some random offset in memory somewhere -- it would be difficult to create an assembly source file which contains this constant in the correct place, since the assembler and linker are free to choose locations for these constants.

Secondly, I'm not entirely sure about this (and it depends on things like position independent code), but I believe the reference to printf is not actually encoded at the pointer address in that code there at all, but the ELF headers contain a lookup table which dynamically replaces its address at runtime. Therefore, the disassembled code doesn't quite correspond to the source assembly code.

In summary, source assembly has symbols while compiled machine code has addresses which are difficult to reverse.

The second major complication is that an assembly source file can't contain all of the information that was present in the original ELF file headers, like which libraries to dynamically link against, and other metadata placed there by the original compiler. It would be difficult to reconstruct this.

Like I said, it's possible that a special tool can manipulate all of this information, but it is unlikely that one can simply produce assembly code which can be reassembled back to the executable.

If you are interested in modifying just a small section of the executable, I recommend a much more subtle approach than recompiling the whole application. Use objdump to get the assembly code for the function(s) you are interested in. Convert it to "source assembly syntax" by hand (and here, I wish there was a tool that actually produced disassembly in the same syntax as the input), and modify it as you wish. When you are done, recompile just those function(s) and use objdump to figure out the machine code for your modified program. Then, use a hex editor to manually paste the new machine code over the top of the corresponding part of the original program, taking care that your new code is precisely the same number of bytes as the old code (or all the offsets would be wrong). If the new code is shorter, you can pad it out using NOP instructions. If it is longer, you may be in trouble, and might have to create new functions and call them instead.

How to disassemble a binary executable in Linux to get the assembly code?

I don't think gcc has a flag for it, since it's primarily a compiler, but another of the GNU development tools does. objdump takes a -d/--disassemble flag:

$ objdump -d /path/to/binary

The disassembly looks like this:

080483b4 <main>:
 80483b4:   8d 4c 24 04             lea    0x4(%esp),%ecx
 80483b8:   83 e4 f0                and    $0xfffffff0,%esp
 80483bb:   ff 71 fc                pushl  -0x4(%ecx)
 80483be:   55                      push   %ebp
 80483bf:   89 e5                   mov    %esp,%ebp
 80483c1:   51                      push   %ecx
 80483c2:   b8 00 00 00 00          mov    $0x0,%eax
 80483c7:   59                      pop    %ecx
 80483c8:   5d                      pop    %ebp
 80483c9:   8d 61 fc                lea    -0x4(%ecx),%esp
 80483cc:   c3                      ret    
 80483cd:   90                      nop
 80483ce:   90                      nop
 80483cf:   90                      nop

diassemble strings properly in shellcode

If you had shellcode which used call or jmp to jump over some data, you'd have to replace the strings with NOPs if the disassembler got out of sync while treating the data as instructions, as @DavidJ suggested.

In this case, you're just disassembling in the wrong mode.
The jnc is clearly bogus (as I think you realized).

The disassembler is treating the push opcode (the 0x68 byte) as the start of push imm16, because that's how 16-bit mode works. But in 32 and 64-bit modes, the same opcode is the start of a push imm32. So push instruction is actually 5 bytes instead of 3, and the next instruction is actually the next push.

The bogus short-jnc is a huge hint that this is not 16-bit code.

Use ndisasm -b32 or -b64. Ndisasm can read input from stdin, so I used python2 -c 'print "... "' | ndisasm - -b32.

When using objdump, if you prefer Intel syntax, use objdump -d -Mintel. So you could objdump -Mintel -bbinary -D -mi386 /tmp/shellcode for 32-bit (-mi386 selects x86 as the architecture (rather than ARM or MIPS or whatever), and implies -Mi386 32-bit mode as well).

Or for 64-bit, objdump -D -b binary -mi386 -Mx86-64 /tmp/shellcode works. (objdump won't read the binary from stdin :/) Check the objdump man page for more about -M options.

I use this alias in my ~/.bashrc: alias disas='objdump -drwC -Mintel', because I normally disassemble ELF executables / objects to see what a compiler did, not shellcode. You might want -D in your alias.

I'm pretty sure this is 32-bit code, because in 64-bit mode the two pushes would leave a gap. The is no push imm64, but push imm32 is a 64-bit push with the immediate sign-extended to 64 bits. In 64-bit mode, you might use

push  'abcd'
mov   [rsp+4], 'efgh'

to end up with rsp pointing to "abcdefgh".

Also, the use of int 0x80 with a stack address is a big clue this is not 64-bit code. int 0x80 works on Linux in 64-bit mode, but it truncates all inputs to 32-bit: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

The 32-bit disassembly from ndisasm is:

00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  90                nop
00000007  90                nop
00000008  90                nop
00000009  31C0              xor eax,eax
0000000B  50                push eax
0000000C  682F2F7368        push dword 0x68732f2f
00000011  682F62696E        push dword 0x6e69622f
00000016  89E3              mov ebx,esp
00000018  50                push eax
00000019  53                push ebx
0000001A  89E1              mov ecx,esp
0000001C  B00B              mov al,0xb
0000001E  CD80              int 0x80
00000020  200A              and [edx],cl

Which looks sane. It contains no branches, but

Is there a way to put proper labels on jumps?

Yes, Agner Fog's objconv disassembler can put labels on branch targets to help you figure out which branch goes where.
See How do I disassemble raw x86 code?