How do I disassemble raw 16-bit x86 machine code?
You can use objdump. According to this article the syntax is:
objdump -D -b binary -mi386 -Maddr16,data16 mbr
Is all data valid x86 16-bit machine code?
I think it is not all valid, cause I think ndisasm
will output lines like db 0x82
when it doesn't match an instruction. Lines like that are there.
Disassembling A Flat Binary File Using objdump
I found the solution to my own question on a different forum. It looks something like this:
objdump -b binary --adjust-vma=0xabcd1000 -D file.bin
I've tested this and it works.
How to disassemble a shellcode into assembly instruction?
As the comments have suggested, your issue is that you have output the string \xeb\x1d
as ASCII into the file you are trying to disassemble. You may have done something like:
echo '\xeb\x1d' >foo
You can do this but you will want to tell echo
to interpret the escape character \
. This can be done with the -e
option.
You'll want it to not append a newline on the end using the -n
option. This is documented in the ECHO manual page:
-n do not output the trailing newline
-e enable interpretation of backslash escapes
This may work:
echo -ne '\xeb\x1d' >foo
Using NDISASM to disassemble the bytes:
ndisasm -b32 foo
Should now produce:
00000000 EB1D jmp short 0x1f
Without using an intermediate file (like foo
) you can pipe ECHO output into NDISASM and disassemble it that way. This line would take a shell code string and output the disassembly as well:
echo -ne '\xeb\x1d' | ndisasm -b32 -
The -
on the end is needed to tell NDISASM to disassemble input from standard input rather than an explicit file.
We have now revolutionized the IT industry! ;-)
Disassemble raw x64 machine code
And as usual writing down the question already gives you some rather good ideas what else to try..
Anyhow the right machine architecture is: i386:x86-64
.
The full command is:
objdump -b binary -D -m i386:x86-64 <file>
If you want to disassemble code that expects to be loaded at a specific address, you can add the --adjust-vma <load-address>
flag.
Objdump of .code16 and .code32 x86 assembly
.code 16
tells the assembler to assume the code will be run in 16-bit mode, e.g. to use the 66
operand-size prefix for 32-bit operand-size instead of the default 16. However, you assemble and link it into an elf32 binary, which means the file metadata still indicates 32-bit code. (There's no such thing as an x86-16 Linux ELF file).
Objdump disassembles according to the file metadata, thus as 32-bit code, unless you override with -m i8086
. The sizes you're getting match the binary for 32-bit disassembly.
You'll probably actually see breakage if you assemble an instruction that has a different length in 16bit mode, like
add $129, %ax # 129 doesn't fit in an imm8
If assembled as a 16bit instruction, it will have no prefix, and an imm16 source operand. Decoded as a 32bit instruction, it will have an imm32 source operand, which takes more total bytes following the opcode. An operand-size prefix would change the length of the rest of the instruction (not including prefixes), for either mode. BTW, (pre-)decoding slows down on Intel CPUs for this special case where a prefix is length-changing for the rest of the instruction. (https://agner.org/optimize/)
Anyway, disassembling that instruction with the wrong code size will lead to the disassembler getting out of sync with instruction boundaries, so it will definitively test what mode it's being interpreted in.
If you're making normal user-space code (not a kernel that switches modes, or needs to be 16-bit), .code32
and .code64
are useless. They just let you put the machine code into the wrong kind of ELF file. (Assembling 32-bit binaries on a 64-bit system (GNU toolchain))
BTW, moving to %ss
implicitly prevents interrupts until after the next instruction. (Which should set the stack pointer). You can avoid cli/sti
that way.
How do I disassemble raw MIPS code?
Hmm, it seems easier than that. -b elf32-tradlittlemips
does not work because the file is not an ELF executable, but binary. So, the correct option to be used is -b binary
. The other option, -mmips
makes objdump recognize the file as binary for MIPS. Since the target machine is little endian, I also had to add -EL
to make the output match the output for x.o
.
-mmips
only includes the basic instruction set. The AR7 has a MIPS32 processor which has more instructions than just mips. To decode these newer MIPS32 instructions, use -mmips:isa32
. A list of available ISAs can be listed with objdump -i -m
.
The final command becomes:
mipsel-linux-gnu-objdump -b binary -mmips:isa32 -EL -D vmlinux
This would show registers like $3
instead of their names. To adjust that, I used the next additional options which are mentioned in mipsel-linux-gnu-objdump --help
:
-Mgpr-names=32,cp0-names=mips32,cp0-names=mips32,hwr-names=mips32,reg-names=mips32
I chose for mips32
after reading:
- http://www.linux-mips.org/wiki/AR7
- http://www.linux-mips.org/wiki/Instruction_Set_Architecture
diassemble strings properly in shellcode
If you had shellcode which used call
or jmp
to jump over some data, you'd have to replace the strings with NOPs if the disassembler got out of sync while treating the data as instructions, as @DavidJ suggested.
In this case, you're just disassembling in the wrong mode.
The jnc
is clearly bogus (as I think you realized).
The disassembler is treating the push
opcode (the 0x68
byte) as the start of push imm16
, because that's how 16-bit mode works. But in 32 and 64-bit modes, the same opcode is the start of a push imm32
. So push
instruction is actually 5 bytes instead of 3, and the next instruction is actually the next push
.
The bogus short-jnc
is a huge hint that this is not 16-bit code.
Use ndisasm -b32
or -b64
. Ndisasm can read input from stdin, so I used python2 -c 'print "... "' | ndisasm - -b32
.
When using objdump
, if you prefer Intel syntax, use objdump -d -Mintel
. So you could objdump -Mintel -bbinary -D -mi386 /tmp/shellcode
for 32-bit (-mi386
selects x86 as the architecture (rather than ARM or MIPS or whatever), and implies -Mi386
32-bit mode as well).
Or for 64-bit, objdump -D -b binary -mi386 -Mx86-64 /tmp/shellcode
works. (objdump
won't read the binary from stdin :/) Check the objdump
man page for more about -M
options.
I use this alias in my ~/.bashrc
: alias disas='objdump -drwC -Mintel'
, because I normally disassemble ELF executables / objects to see what a compiler did, not shellcode. You might want -D
in your alias.
I'm pretty sure this is 32-bit code, because in 64-bit mode the two pushes would leave a gap. The is no push imm64
, but push imm32
is a 64-bit push with the immediate sign-extended to 64 bits. In 64-bit mode, you might use
push 'abcd'
mov [rsp+4], 'efgh'
to end up with rsp pointing to "abcdefgh"
.
Also, the use of int 0x80
with a stack address is a big clue this is not 64-bit code. int 0x80
works on Linux in 64-bit mode, but it truncates all inputs to 32-bit: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
The 32-bit disassembly from ndisasm is:
00000000 90 nop
00000001 90 nop
00000002 90 nop
00000003 90 nop
00000004 90 nop
00000005 90 nop
00000006 90 nop
00000007 90 nop
00000008 90 nop
00000009 31C0 xor eax,eax
0000000B 50 push eax
0000000C 682F2F7368 push dword 0x68732f2f
00000011 682F62696E push dword 0x6e69622f
00000016 89E3 mov ebx,esp
00000018 50 push eax
00000019 53 push ebx
0000001A 89E1 mov ecx,esp
0000001C B00B mov al,0xb
0000001E CD80 int 0x80
00000020 200A and [edx],cl
Which looks sane. It contains no branches, but
Is there a way to put proper labels on jumps?
Yes, Agner Fog's objconv
disassembler can put labels on branch targets to help you figure out which branch goes where.
See How do I disassemble raw x86 code?
Related Topics
How to Ensure Only One Instance of a Bash Script Is Running
How to Compile a 32-Bit Binary on a 64-Bit Linux Machine With Gcc/Cmake
How to Remove ^[, and All of the Escape Sequences in a File Using Linux Shell Scripting
Why Do X86-64 Linux System Calls Modify Rcx, and What Does the Value Mean
Tool to Trace Local Function Calls in Linux
Using Awk to Print All Columns from the Nth to the Last
Static Link of Shared Library Function in Gcc
Better Way to Rename Files Based on Multiple Patterns
What's the Difference Between Nohup and Ampersand
How to Setup Public-Key Authentication
Setting Environment Variables in Linux Using Bash
How to Fix Java.Lang.Module.Findexception: Module Java.Se.Ee Not Found
Multiline Bash Command in Jenkins Pipeline