How to Disassemble Raw 16-Bit X86 Machine Code

How do I disassemble raw 16-bit x86 machine code?

You can use objdump. According to this article the syntax is:

objdump -D -b binary -mi386 -Maddr16,data16 mbr

Is all data valid x86 16-bit machine code?

I think it is not all valid, cause I think ndisasm will output lines like db 0x82 when it doesn't match an instruction. Lines like that are there.

Disassembling A Flat Binary File Using objdump

I found the solution to my own question on a different forum. It looks something like this:

objdump -b binary --adjust-vma=0xabcd1000 -D file.bin

I've tested this and it works.

How to disassemble a shellcode into assembly instruction?

As the comments have suggested, your issue is that you have output the string \xeb\x1d as ASCII into the file you are trying to disassemble. You may have done something like:

echo '\xeb\x1d' >foo

You can do this but you will want to tell echo to interpret the escape character \. This can be done with the -e option.

You'll want it to not append a newline on the end using the -n option. This is documented in the ECHO manual page:

  -n     do not output the trailing newline
  -e     enable interpretation of backslash escapes

This may work:

echo -ne '\xeb\x1d' >foo

Using NDISASM to disassemble the bytes:

ndisasm -b32 foo

Should now produce:

00000000  EB1D              jmp short 0x1f

Without using an intermediate file (like foo) you can pipe ECHO output into NDISASM and disassemble it that way. This line would take a shell code string and output the disassembly as well:

echo -ne '\xeb\x1d' | ndisasm -b32 -

The - on the end is needed to tell NDISASM to disassemble input from standard input rather than an explicit file.

We have now revolutionized the IT industry! ;-)

Disassemble raw x64 machine code

And as usual writing down the question already gives you some rather good ideas what else to try..

Anyhow the right machine architecture is: i386:x86-64.

The full command is:

objdump -b binary -D -m i386:x86-64 <file>

If you want to disassemble code that expects to be loaded at a specific address, you can add the --adjust-vma <load-address> flag.

Objdump of .code16 and .code32 x86 assembly

.code 16 tells the assembler to assume the code will be run in 16-bit mode, e.g. to use the 66 operand-size prefix for 32-bit operand-size instead of the default 16. However, you assemble and link it into an elf32 binary, which means the file metadata still indicates 32-bit code. (There's no such thing as an x86-16 Linux ELF file).

Objdump disassembles according to the file metadata, thus as 32-bit code, unless you override with -m i8086. The sizes you're getting match the binary for 32-bit disassembly.

You'll probably actually see breakage if you assemble an instruction that has a different length in 16bit mode, like

add  $129,  %ax  # 129 doesn't fit in an imm8

If assembled as a 16bit instruction, it will have no prefix, and an imm16 source operand. Decoded as a 32bit instruction, it will have an imm32 source operand, which takes more total bytes following the opcode. An operand-size prefix would change the length of the rest of the instruction (not including prefixes), for either mode. BTW, (pre-)decoding slows down on Intel CPUs for this special case where a prefix is length-changing for the rest of the instruction. (https://agner.org/optimize/)

Anyway, disassembling that instruction with the wrong code size will lead to the disassembler getting out of sync with instruction boundaries, so it will definitively test what mode it's being interpreted in.

If you're making normal user-space code (not a kernel that switches modes, or needs to be 16-bit), .code32 and .code64 are useless. They just let you put the machine code into the wrong kind of ELF file. (Assembling 32-bit binaries on a 64-bit system (GNU toolchain))

BTW, moving to %ss implicitly prevents interrupts until after the next instruction. (Which should set the stack pointer). You can avoid cli/sti that way.

How do I disassemble raw MIPS code?

Hmm, it seems easier than that. -b elf32-tradlittlemips does not work because the file is not an ELF executable, but binary. So, the correct option to be used is -b binary. The other option, -mmips makes objdump recognize the file as binary for MIPS. Since the target machine is little endian, I also had to add -EL to make the output match the output for x.o.

-mmips only includes the basic instruction set. The AR7 has a MIPS32 processor which has more instructions than just mips. To decode these newer MIPS32 instructions, use -mmips:isa32. A list of available ISAs can be listed with objdump -i -m.

The final command becomes:

mipsel-linux-gnu-objdump -b binary -mmips:isa32 -EL -D vmlinux

This would show registers like $3 instead of their names. To adjust that, I used the next additional options which are mentioned in mipsel-linux-gnu-objdump --help:

-Mgpr-names=32,cp0-names=mips32,cp0-names=mips32,hwr-names=mips32,reg-names=mips32

I chose for mips32 after reading:

http://www.linux-mips.org/wiki/AR7
http://www.linux-mips.org/wiki/Instruction_Set_Architecture

diassemble strings properly in shellcode

If you had shellcode which used call or jmp to jump over some data, you'd have to replace the strings with NOPs if the disassembler got out of sync while treating the data as instructions, as @DavidJ suggested.

In this case, you're just disassembling in the wrong mode.
The jnc is clearly bogus (as I think you realized).

The disassembler is treating the push opcode (the 0x68 byte) as the start of push imm16, because that's how 16-bit mode works. But in 32 and 64-bit modes, the same opcode is the start of a push imm32. So push instruction is actually 5 bytes instead of 3, and the next instruction is actually the next push.

The bogus short-jnc is a huge hint that this is not 16-bit code.

Use ndisasm -b32 or -b64. Ndisasm can read input from stdin, so I used python2 -c 'print "... "' | ndisasm - -b32.

When using objdump, if you prefer Intel syntax, use objdump -d -Mintel. So you could objdump -Mintel -bbinary -D -mi386 /tmp/shellcode for 32-bit (-mi386 selects x86 as the architecture (rather than ARM or MIPS or whatever), and implies -Mi386 32-bit mode as well).

Or for 64-bit, objdump -D -b binary -mi386 -Mx86-64 /tmp/shellcode works. (objdump won't read the binary from stdin :/) Check the objdump man page for more about -M options.

I use this alias in my ~/.bashrc: alias disas='objdump -drwC -Mintel', because I normally disassemble ELF executables / objects to see what a compiler did, not shellcode. You might want -D in your alias.

I'm pretty sure this is 32-bit code, because in 64-bit mode the two pushes would leave a gap. The is no push imm64, but push imm32 is a 64-bit push with the immediate sign-extended to 64 bits. In 64-bit mode, you might use

push  'abcd'
mov   [rsp+4], 'efgh'

to end up with rsp pointing to "abcdefgh".

Also, the use of int 0x80 with a stack address is a big clue this is not 64-bit code. int 0x80 works on Linux in 64-bit mode, but it truncates all inputs to 32-bit: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

The 32-bit disassembly from ndisasm is:

00000000  90                nop
00000001  90                nop
00000002  90                nop
00000003  90                nop
00000004  90                nop
00000005  90                nop
00000006  90                nop
00000007  90                nop
00000008  90                nop
00000009  31C0              xor eax,eax
0000000B  50                push eax
0000000C  682F2F7368        push dword 0x68732f2f
00000011  682F62696E        push dword 0x6e69622f
00000016  89E3              mov ebx,esp
00000018  50                push eax
00000019  53                push ebx
0000001A  89E1              mov ecx,esp
0000001C  B00B              mov al,0xb
0000001E  CD80              int 0x80
00000020  200A              and [edx],cl

Which looks sane. It contains no branches, but

Is there a way to put proper labels on jumps?

Yes, Agner Fog's objconv disassembler can put labels on branch targets to help you figure out which branch goes where.
See How do I disassemble raw x86 code?