Linux Mach-O Disassembler
AFAIK, the native Darwin binary tools are part of the cctools package. They don't have the same command line syntax or output as the GNU binutils. Later binutils (i.e., 2.22) supports the Mach-O format however. You can get these prebuilt, with the 'g
' prefix to the tool names, as mentioned here. Alternatively, you can compile binutils, with something like:
> ./configure --prefix=$CROSSTOOLDIR --target=x86_64-apple-darwin \
--enable-64-bit-bfd --disable-nls --disable-werror
Installation will yield a bin/
directory where the utilities are prefixed with x86_64-apple-darwin
. It should handle i386 Mach-O format (and FAT binaries) fine.
How to disassemble a binary executable in Linux to get the assembly code?
I don't think gcc
has a flag for it, since it's primarily a compiler, but another of the GNU development tools does. objdump
takes a -d
/--disassemble
flag:
$ objdump -d /path/to/binary
The disassembly looks like this:
080483b4 <main>:
80483b4: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483b8: 83 e4 f0 and $0xfffffff0,%esp
80483bb: ff 71 fc pushl -0x4(%ecx)
80483be: 55 push %ebp
80483bf: 89 e5 mov %esp,%ebp
80483c1: 51 push %ecx
80483c2: b8 00 00 00 00 mov $0x0,%eax
80483c7: 59 pop %ecx
80483c8: 5d pop %ebp
80483c9: 8d 61 fc lea -0x4(%ecx),%esp
80483cc: c3 ret
80483cd: 90 nop
80483ce: 90 nop
80483cf: 90 nop
How To Understand Mach-O Symbol Table
Straight from <mach-o/nlist.h>
:
struct nlist {
union {
uint32_t n_strx; /* index into the string table */
} n_un;
uint8_t n_type; /* type flag, see below */
uint8_t n_sect; /* section number or NO_SECT */
int16_t n_desc; /* see <mach-o/stab.h> */
uint32_t n_value; /* value of this symbol (or stab offset) */
};
struct nlist_64 {
union {
uint32_t n_strx; /* index into the string table */
} n_un;
uint8_t n_type; /* type flag, see below */
uint8_t n_sect; /* section number or NO_SECT */
uint16_t n_desc; /* see <mach-o/stab.h> */
uint64_t n_value; /* value of this symbol (or stab offset) */
};
So no, that shouldn't be 8 bytes, but rather 12 bytes for 32-bit and 16 bytes for 64-bit binaries.
tools on Linux to read/dump MachO files?
I provided a previous answer describing how to use the binutils (objdump and friends) for Mach-O format. Hope it helps - but since you mention iOS, you may have to change the --target
parameter to an ARM target.
How to disassemble, modify and then reassemble a Linux executable?
I don't think there is any reliable way to do this. Machine code formats are very complicated, more complicated than assembly files. It isn't really possible to take a compiled binary (say, in ELF format) and produce a source assembly program which will compile to the same (or similar-enough) binary. To gain an understanding of the differences, compare the output of GCC compiling direct to assembler (gcc -S
) versus the output of objdump on the executable (objdump -D
).
There are two major complications I can think of. Firstly, the machine code itself is not a 1-to-1 correspondence with assembly code, because of things like pointer offsets.
For example, consider the C code to Hello world:
int main()
{
printf("Hello, world!\n");
return 0;
}
This compiles to the x86 assembly code:
.LC0:
.string "hello"
.text
<snip>
movl $.LC0, %eax
movl %eax, (%esp)
call printf
Where .LCO is a named constant, and printf is a symbol in a shared library symbol table. Compare to the output of objdump:
80483cd: b8 b0 84 04 08 mov $0x80484b0,%eax
80483d2: 89 04 24 mov %eax,(%esp)
80483d5: e8 1a ff ff ff call 80482f4 <printf@plt>
Firstly, the constant .LC0 is now just some random offset in memory somewhere -- it would be difficult to create an assembly source file which contains this constant in the correct place, since the assembler and linker are free to choose locations for these constants.
Secondly, I'm not entirely sure about this (and it depends on things like position independent code), but I believe the reference to printf is not actually encoded at the pointer address in that code there at all, but the ELF headers contain a lookup table which dynamically replaces its address at runtime. Therefore, the disassembled code doesn't quite correspond to the source assembly code.
In summary, source assembly has symbols while compiled machine code has addresses which are difficult to reverse.
The second major complication is that an assembly source file can't contain all of the information that was present in the original ELF file headers, like which libraries to dynamically link against, and other metadata placed there by the original compiler. It would be difficult to reconstruct this.
Like I said, it's possible that a special tool can manipulate all of this information, but it is unlikely that one can simply produce assembly code which can be reassembled back to the executable.
If you are interested in modifying just a small section of the executable, I recommend a much more subtle approach than recompiling the whole application. Use objdump to get the assembly code for the function(s) you are interested in. Convert it to "source assembly syntax" by hand (and here, I wish there was a tool that actually produced disassembly in the same syntax as the input), and modify it as you wish. When you are done, recompile just those function(s) and use objdump to figure out the machine code for your modified program. Then, use a hex editor to manually paste the new machine code over the top of the corresponding part of the original program, taking care that your new code is precisely the same number of bytes as the old code (or all the offsets would be wrong). If the new code is shorter, you can pad it out using NOP instructions. If it is longer, you may be in trouble, and might have to create new functions and call them instead.
Disassemble raw x64 machine code
And as usual writing down the question already gives you some rather good ideas what else to try..
Anyhow the right machine architecture is: i386:x86-64
.
The full command is:
objdump -b binary -D -m i386:x86-64 <file>
If you want to disassemble code that expects to be loaded at a specific address, you can add the --adjust-vma <load-address>
flag.
How do I disassemble raw 16-bit x86 machine code?
You can use objdump. According to this article the syntax is:
objdump -D -b binary -mi386 -Maddr16,data16 mbr
Disassemble, modify, and reassemble executable
There is no reliable way to do this with normal assembler syntax. See How to disassemble, modify and then reassemble a Linux executable?. Section info is typically not faithfully disassembled, so you'd need a special format designed for modify and reassembling + relinking.
Also, instruction-lengths are a problem when code only works when padded by using longer encodings. (e.g. in a table of jump targets for a computed goto). See Where are GNU assembler instruction suffixes like ".s" in x86 "mov.s" documented?, but note that disassemblers don't support disassembling into that format.
ndisasm
doesn't understand object file formats, so it disassembles headers as machine code!
For this to have any hope of working, use a disassembler like Agner Fog's objconv
which will output asm source (NASM, MASM, or GAS AT&T) which does assemble. It might not actually work if any of the code depended on a specific longer-than-default encoding.
I'm not sure how faithful objconv
is with respect to emitting section .bss
, section .rodata
and other directives like that to place data where it found it in the object file, but that's what you need.
Re: absolute relocations: make sure you put DEFAULT REL
at the top of your file. I forget if objconv
does this by default. x86-64 Mach-o only supports PC-relative relocations, so you have to create position-independent code (e.g. using RIP-relative addressing modes).
ndisasm
doesn't read the symbol table, so all its operands use absolute addressing. objconv
makes up label names for jump targets and static data that doesn't appear in the symbol table.
Related Topics
Linux Bash, Camel Case String to Separate by Dash
Remove a Symlink to a Directory
Start Script After Another One (Already Running) Finishes
How to Tell Linux Not to Swap Out a Particular Processes' Memory
Redirecting Stdout with Find -Exec and Without Creating New Shell
Why Doesn't the Cd Command Work in My Shell Program
Cross-Compile a Rust Application from Linux to Windows
What Is the Purpose of Map_Anonymous Flag in Mmap System Call
Explaining the 'Find -Mtime' Command
How to Get Child Process from Parent Process
Prevent Linux Thread from Being Interrupted by Scheduler
Are There Standards for Linux Command Line Switches and Arguments
User-Data Scripts Is Not Running on My Custom Ami, But Working in Standard Amazon Linux
X86_64 Assembly Linux System Call Confusion
Bash: Set Array Env Variable and De-Referencing It from Any Shell Script Fails