How to Get Gcc to Output Raw Binary

Is there a way to get gcc to output raw binary?

Try this out:

$ gcc -c test.c     
$ objcopy -O binary -j .text test.o binfile

You can make sure it's correct with objdump:

$ objdump -d test.o 
test.o: file format pe-i386


Disassembly of section .text:

00000000 <_f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 0f af 45 08 imul 0x8(%ebp),%eax
d: 89 45 fc mov %eax,-0x4(%ebp)
10: 8b 45 fc mov -0x4(%ebp),%eax
13: 83 c0 02 add $0x2,%eax
16: c9 leave
17: c3 ret

And compare it with the binary file:

$ hexdump -C binfile 
00000000 55 89 e5 83 ec 04 8b 45 08 0f af 45 08 89 45 fc |U......E...E..E.|
00000010 8b 45 fc 83 c0 02 c9 c3 |.E......|
00000018

Compile code to raw binary

You can pass -j .text to objcopy.

use gcc to directly compile to machine code without linking

I want to get gcc to compile c-code for me into x86-32 linux binary code, but without any librarys or so around it.

That means you write freestanding C code. (When the standard library is available, you have a hosted environment; when not, a freestanding one. )

To compile e.g. foo.c to an executable, foo, make sure it has a _start() function, and use

gcc -march=i686 -mtune=generic -m32 -ffreestanding -nostdlib -nostartfiles foo.c -o foo

The GNU toolchain uses the address of the _start symbol to encode the start address of the executable in the ELF file.

This answer is an actual real-world example for x86-64. For x86-32 (or any other architecture), you'll need to adjust the SYSCALL_ macros.


In a comment, OP explains they want a binary blob, instead of an ELF executable.

In this case, it is best to tell the compiler to generate a position independent executable. For example, 'blob.c':

void do_something(int arg)
{
/* Do something with arg, perhaps a syscall,
or inline assembly? */
}

void loop_something(int from, int to)
{
int arg;

if (from <= to)
for (arg = from; arg <= to; arg++)
do_something(arg);
else
for (arg = from; arg <= to; arg--)
do_something(arg);
}

void _start(void)
{
loop_something(2, 5);
do_something(6);
loop_something(5, 2);
do_something(1);
}

I do recommend declaring all functions except _start as static, to avoid any global offset table (GOT) or procedure linkage table (PLT) references (like <__x86.get_pc_thunk.bx> calls).

Compile this to an position independent executable using e.g.

gcc -march=i686 -mtune=generic -m32 -O2 -fPIE -ffreestanding -nostdlib -nostartfiles blob.c -o blob

strip it,

strip --strip-all blob

and dump the contents of the binary:

objdump -fd blob

In this output, there are two important lines:

start address 0x08048120

which tells the address of the _start symbol, and

080480e0 <.text>:

which tells the offset of the code, in hexadecimal. Subtract the former from the latter (0x08048120 - 0x080480e0 = 0x40 = 64) to get the offset of the start symbol.

Finally, dump the code into a raw binary file 'blob.raw' using

objcopy -O binary -j .text blob blob.raw

How to link object files into a raw binary file with gcc/ld?

You need to force the linker to emit the ROM1 region by creating an output section with some content. The manual says:

Other link script directives that allocate space in an output section will also create the output section. So too will assignments to dot even if the assignment does not create space, except for ‘. = 0’, ‘. = . + 0’, ‘. = sym’, ‘. = . + sym’ and ‘. = ALIGN (. != 0, expr, 1)’ when ‘sym’ is an absolute symbol of value 0 defined in the script. This allows you to force output of an empty section with ‘. = .’.

So this should work:

MEMORY 
{
ROM1 (rx) : ORIGIN = 0x00, LENGTH = 16
ROM2 (rx) : ORIGIN = 0x10, LENGTH = 16
}

SECTIONS
{
.dummy :
{
. = ORIGIN(ROM1) + LENGTH(ROM1);
} >ROM1

.text :
{
*(.text)
. = ORIGIN(ROM2) + LENGTH(ROM2);
} >ROM2
}

OUTPUT_FORMAT(binary)

However, at least with my binutils version 2.33.1, it doesn't. .=. doesn't work either. If you only need the region for padding, you can emit some data into it, e.g. by a BYTE(0) directive and that works:

MEMORY 
{
ROM1 (rx) : ORIGIN = 0x00, LENGTH = 16
ROM2 (rx) : ORIGIN = 0x10, LENGTH = 16
}

SECTIONS
{
.dummy :
{
BYTE(0);
. = ORIGIN(ROM1) + LENGTH(ROM1);
} >ROM1

.text :
{
*(.text)
. = ORIGIN(ROM2) + LENGTH(ROM2);
} >ROM2
}

OUTPUT_FORMAT(binary)

If you do have some content for ROM1 then of course just create input section for it but make sure it always exists otherwise the linker will remove it. Strangely enough, even a zero sized section works.

Generate raw binary from C code in Linux

A few hints first:

  • avoid naming your starting routine main. It is confusing (both for the reader and perhaps for the compiler; when you don't pass -ffreestanding to gcc it is handling main very specifically). Use something else like start or begin_of_my_kernel ...

  • compile with gcc -v to understand what your particular compiler is doing.

  • you probably should ask your compiler for some optimizations and all warnings, so pass -O -Wall at least to gcc

  • you may want to look into the produced assembler code, so use gcc -S -O -Wall -fverbose-asm kernel.c to get the kernel.s assembler file and glance into it

  • as commented by Michael Petch you might want to pass -fno-exceptions

  • your probably need some linker script and/or some hand-written assembler for crt0

  • you should read something about linkers & loaders


 kernel.c:(.text+0xc): undefined reference to '_GLOBAL_OFFSET_TABLE_'

This smells like something related to position-independent-code. My guess: try compiling with an explicit -fno-pic or -fno-pie

(on some Linux distributions, their gcc might be configured with some -fpic enabled by default)

PS. Don't forget to add -m32 to gcc if you want x86 32 bits binaries.

Setting start address to execute raw binary file

. = 0x0500 does not correspond to 0x0500:0. 0x0500:0 is physical address 0x5000, not 0x500.

Also, if you're trying to compile C code as 32-bit and run it in real mode (which is 16-bit), it won't work. You need to either compile code as 16-bit or switch the CPU into 32-bit protected mode. There aren't that many C compilers still compiling 16-bit code. Turbo C++ is one, Open Watcom is another. AFAIK, gcc can't do that.

Finally, I'm guessing you expect the entry point to be at 0x500:0 (0x5000 physical). You need to either tell this to the linker (I don't remember how, if at all possible) or deal with an arbitrary location of the entry point (i.e. extract it from the binary somehow).

How to compile an assembly file to a raw binary (like DOS .com) format with GNU assembler (as)?

ld --oformat binary

For quick and dirty tests you can do:

as -o a.o a.S
ld --oformat binary -o a.out a.o
hd a.out

Gives:

00000000  90 90                                             |..|
00000002

Unfortunately this gives a warning:

ld: warning: cannot find entry symbol _start; defaulting to 0000000000400000

which does not make much sense with binary. It could be silenced with:

.section .text
.globl start
start:
nop
nop

and:

ld -e start --oformat binary -o a.out a.o

or simply with:

ld -e 0 --oformat binary -o a.out a.o

which tells ld that the entry point is not _start but the code at address 0.

It is a shame that neither as nor ld can take input / ouptut from stdin / stdout, so no piping.

Proper boot sector

If you are going to to something more serious, the best method is to generate a clean minimal linker script. linker.ld:

SECTIONS
{
. = 0x7c00;
.text :
{
*(.*)
. = 0x1FE;
SHORT(0xAA55)
}
}

Here we also place the magic bytes with the linker script.

The linker script is important above all to control the output addresses after relocation. Learn more about relocation at: https://stackoverflow.com/a/30507725/895245

Use it as:

as -o a.o a.S
ld --oformat binary -o a.img -T linker.ld a.o

And then you can boot as:

qemu-system-i386 -hda a.img

Working examples on this repository: https://github.com/cirosantilli/x86-bare-metal-examples/blob/d217b180be4220a0b4a453f31275d38e697a99e0/Makefile

Tested on Binutils 2.24, Ubuntu 14.04.

how to get the bare bone compiled binary code of a C function?

You accidentally produced an ELF file instead of a simple BIN file. (You can verify this using the file utility if your system has it.)

To produce a small BIN file from your code, change your second command to:

arm-none-eabi-objcopy -j .text test.o -O binary test.bin

Note that there are likely to be tons of complications and security issues when you execute arbitrary machine code received over a serial line. I am not recommending that as a design, just trying to answer the question you asked.

Embedding resources in executable using GCC

There are a couple possibilities:

  • use ld's capability to turn any file into an object (Embedding binary blobs using gcc mingw):

    ld -r -b binary -o binary.o foo.bar  # then link in binary.o
  • use a bin2c/bin2h utility to turn any file into an array of bytes (Embed image in code, without using resource section or external images)


Update: Here's a more complete example of how to use data bound into the executable using ld -r -b binary:

#include <stdio.h>

// a file named foo.bar with some example text is 'imported' into
// an object file using the following command:
//
// ld -r -b binary -o foo.bar.o foo.bar
//
// That creates an bject file named "foo.bar.o" with the following
// symbols:
//
// _binary_foo_bar_start
// _binary_foo_bar_end
// _binary_foo_bar_size
//
// Note that the symbols are addresses (so for example, to get the
// size value, you have to get the address of the _binary_foo_bar_size
// symbol).
//
// In my example, foo.bar is a simple text file, and this program will
// dump the contents of that file which has been linked in by specifying
// foo.bar.o as an object file input to the linker when the progrma is built

extern char _binary_foo_bar_start[];
extern char _binary_foo_bar_end[];

int main(void)
{
printf( "address of start: %p\n", &_binary_foo_bar_start);
printf( "address of end: %p\n", &_binary_foo_bar_end);

for (char* p = _binary_foo_bar_start; p != _binary_foo_bar_end; ++p) {
putchar( *p);
}

return 0;
}

Update 2 - Getting the resource size: I could not read the _binary_foo_bar_size correctly. At runtime, gdb shows me the right size of the text resource by using display (unsigned int)&_binary_foo_bar_size. But assigning this to a variable gave always a wrong value. I could solve this issue the following way:

unsigned int iSize =  (unsigned int)(&_binary_foo_bar_end - &_binary_foo_bar_start)

It is a workaround, but it works good and is not too ugly.



Related Topics



Leave a reply



Submit