Is there a way to get gcc to output raw binary?
Try this out:
$ gcc -c test.c
$ objcopy -O binary -j .text test.o binfile
You can make sure it's correct with objdump
:
$ objdump -d test.o
test.o: file format pe-i386
Disassembly of section .text:
00000000 <_f>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 8b 45 08 mov 0x8(%ebp),%eax
9: 0f af 45 08 imul 0x8(%ebp),%eax
d: 89 45 fc mov %eax,-0x4(%ebp)
10: 8b 45 fc mov -0x4(%ebp),%eax
13: 83 c0 02 add $0x2,%eax
16: c9 leave
17: c3 ret
And compare it with the binary file:
$ hexdump -C binfile
00000000 55 89 e5 83 ec 04 8b 45 08 0f af 45 08 89 45 fc |U......E...E..E.|
00000010 8b 45 fc 83 c0 02 c9 c3 |.E......|
00000018
Compile code to raw binary
You can pass -j .text
to objcopy
.
use gcc to directly compile to machine code without linking
I want to get gcc to compile c-code for me into x86-32 linux binary code, but without any librarys or so around it.
That means you write freestanding C code. (When the standard library is available, you have a hosted environment; when not, a freestanding one. )
To compile e.g. foo.c to an executable, foo, make sure it has a _start()
function, and use
gcc -march=i686 -mtune=generic -m32 -ffreestanding -nostdlib -nostartfiles foo.c -o foo
The GNU toolchain uses the address of the _start
symbol to encode the start address of the executable in the ELF file.
This answer is an actual real-world example for x86-64. For x86-32 (or any other architecture), you'll need to adjust the SYSCALL_
macros.
In a comment, OP explains they want a binary blob, instead of an ELF executable.
In this case, it is best to tell the compiler to generate a position independent executable. For example, 'blob.c':
void do_something(int arg)
{
/* Do something with arg, perhaps a syscall,
or inline assembly? */
}
void loop_something(int from, int to)
{
int arg;
if (from <= to)
for (arg = from; arg <= to; arg++)
do_something(arg);
else
for (arg = from; arg <= to; arg--)
do_something(arg);
}
void _start(void)
{
loop_something(2, 5);
do_something(6);
loop_something(5, 2);
do_something(1);
}
I do recommend declaring all functions except _start
as static
, to avoid any global offset table (GOT) or procedure linkage table (PLT) references (like <__x86.get_pc_thunk.bx>
calls).
Compile this to an position independent executable using e.g.
gcc -march=i686 -mtune=generic -m32 -O2 -fPIE -ffreestanding -nostdlib -nostartfiles blob.c -o blob
strip it,
strip --strip-all blob
and dump the contents of the binary:
objdump -fd blob
In this output, there are two important lines:
start address 0x08048120
which tells the address of the _start
symbol, and
080480e0 <.text>:
which tells the offset of the code, in hexadecimal. Subtract the former from the latter (0x08048120 - 0x080480e0 = 0x40 = 64) to get the offset of the start symbol.
Finally, dump the code into a raw binary file 'blob.raw' using
objcopy -O binary -j .text blob blob.raw
How to link object files into a raw binary file with gcc/ld?
You need to force the linker to emit the ROM1 region by creating an output section with some content. The manual says:
Other link script directives that allocate space in an output section will also create the output section. So too will assignments to dot even if the assignment does not create space, except for ‘. = 0’, ‘. = . + 0’, ‘. = sym’, ‘. = . + sym’ and ‘. = ALIGN (. != 0, expr, 1)’ when ‘sym’ is an absolute symbol of value 0 defined in the script. This allows you to force output of an empty section with ‘. = .’.
So this should work:
MEMORY
{
ROM1 (rx) : ORIGIN = 0x00, LENGTH = 16
ROM2 (rx) : ORIGIN = 0x10, LENGTH = 16
}
SECTIONS
{
.dummy :
{
. = ORIGIN(ROM1) + LENGTH(ROM1);
} >ROM1
.text :
{
*(.text)
. = ORIGIN(ROM2) + LENGTH(ROM2);
} >ROM2
}
OUTPUT_FORMAT(binary)
However, at least with my binutils version 2.33.1, it doesn't. .=.
doesn't work either. If you only need the region for padding, you can emit some data into it, e.g. by a BYTE(0)
directive and that works:
MEMORY
{
ROM1 (rx) : ORIGIN = 0x00, LENGTH = 16
ROM2 (rx) : ORIGIN = 0x10, LENGTH = 16
}
SECTIONS
{
.dummy :
{
BYTE(0);
. = ORIGIN(ROM1) + LENGTH(ROM1);
} >ROM1
.text :
{
*(.text)
. = ORIGIN(ROM2) + LENGTH(ROM2);
} >ROM2
}
OUTPUT_FORMAT(binary)
If you do have some content for ROM1 then of course just create input section for it but make sure it always exists otherwise the linker will remove it. Strangely enough, even a zero sized section works.
Generate raw binary from C code in Linux
A few hints first:
avoid naming your starting routine
main
. It is confusing (both for the reader and perhaps for the compiler; when you don't pass-ffreestanding
togcc
it is handlingmain
very specifically). Use something else likestart
orbegin_of_my_kernel
...compile with
gcc -v
to understand what your particular compiler is doing.you probably should ask your compiler for some optimizations and all warnings, so pass
-O -Wall
at least togcc
you may want to look into the produced assembler code, so use
gcc -S -O -Wall -fverbose-asm kernel.c
to get thekernel.s
assembler file and glance into itas commented by Michael Petch you might want to pass
-fno-exceptions
your probably need some linker script and/or some hand-written assembler for crt0
you should read something about linkers & loaders
kernel.c:(.text+0xc): undefined reference to '_GLOBAL_OFFSET_TABLE_'
This smells like something related to position-independent-code. My guess: try compiling with an explicit -fno-pic
or -fno-pie
(on some Linux distributions, their gcc
might be configured with some -fpic
enabled by default)
PS. Don't forget to add -m32
to gcc
if you want x86 32 bits binaries.
Setting start address to execute raw binary file
. = 0x0500
does not correspond to 0x0500:0
. 0x0500:0
is physical address 0x5000
, not 0x500
.
Also, if you're trying to compile C code as 32-bit and run it in real mode (which is 16-bit), it won't work. You need to either compile code as 16-bit or switch the CPU into 32-bit protected mode. There aren't that many C compilers still compiling 16-bit code. Turbo C++ is one, Open Watcom is another. AFAIK, gcc can't do that.
Finally, I'm guessing you expect the entry point to be at 0x500:0 (0x5000 physical). You need to either tell this to the linker (I don't remember how, if at all possible) or deal with an arbitrary location of the entry point (i.e. extract it from the binary somehow).
How to compile an assembly file to a raw binary (like DOS .com) format with GNU assembler (as)?
ld --oformat binary
For quick and dirty tests you can do:
as -o a.o a.S
ld --oformat binary -o a.out a.o
hd a.out
Gives:
00000000 90 90 |..|
00000002
Unfortunately this gives a warning:
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400000
which does not make much sense with binary
. It could be silenced with:
.section .text
.globl start
start:
nop
nop
and:
ld -e start --oformat binary -o a.out a.o
or simply with:
ld -e 0 --oformat binary -o a.out a.o
which tells ld
that the entry point is not _start
but the code at address 0
.
It is a shame that neither as
nor ld
can take input / ouptut from stdin / stdout, so no piping.
Proper boot sector
If you are going to to something more serious, the best method is to generate a clean minimal linker script. linker.ld
:
SECTIONS
{
. = 0x7c00;
.text :
{
*(.*)
. = 0x1FE;
SHORT(0xAA55)
}
}
Here we also place the magic bytes with the linker script.
The linker script is important above all to control the output addresses after relocation. Learn more about relocation at: https://stackoverflow.com/a/30507725/895245
Use it as:
as -o a.o a.S
ld --oformat binary -o a.img -T linker.ld a.o
And then you can boot as:
qemu-system-i386 -hda a.img
Working examples on this repository: https://github.com/cirosantilli/x86-bare-metal-examples/blob/d217b180be4220a0b4a453f31275d38e697a99e0/Makefile
Tested on Binutils 2.24, Ubuntu 14.04.
how to get the bare bone compiled binary code of a C function?
You accidentally produced an ELF file instead of a simple BIN file. (You can verify this using the file
utility if your system has it.)
To produce a small BIN file from your code, change your second command to:
arm-none-eabi-objcopy -j .text test.o -O binary test.bin
Note that there are likely to be tons of complications and security issues when you execute arbitrary machine code received over a serial line. I am not recommending that as a design, just trying to answer the question you asked.
Embedding resources in executable using GCC
There are a couple possibilities:
use ld's capability to turn any file into an object (Embedding binary blobs using gcc mingw):
ld -r -b binary -o binary.o foo.bar # then link in binary.o
use a
bin2c
/bin2h
utility to turn any file into an array of bytes (Embed image in code, without using resource section or external images)
Update: Here's a more complete example of how to use data bound into the executable using ld -r -b binary
:
#include <stdio.h>
// a file named foo.bar with some example text is 'imported' into
// an object file using the following command:
//
// ld -r -b binary -o foo.bar.o foo.bar
//
// That creates an bject file named "foo.bar.o" with the following
// symbols:
//
// _binary_foo_bar_start
// _binary_foo_bar_end
// _binary_foo_bar_size
//
// Note that the symbols are addresses (so for example, to get the
// size value, you have to get the address of the _binary_foo_bar_size
// symbol).
//
// In my example, foo.bar is a simple text file, and this program will
// dump the contents of that file which has been linked in by specifying
// foo.bar.o as an object file input to the linker when the progrma is built
extern char _binary_foo_bar_start[];
extern char _binary_foo_bar_end[];
int main(void)
{
printf( "address of start: %p\n", &_binary_foo_bar_start);
printf( "address of end: %p\n", &_binary_foo_bar_end);
for (char* p = _binary_foo_bar_start; p != _binary_foo_bar_end; ++p) {
putchar( *p);
}
return 0;
}
Update 2 - Getting the resource size: I could not read the _binary_foo_bar_size correctly. At runtime, gdb shows me the right size of the text resource by using display (unsigned int)&_binary_foo_bar_size
. But assigning this to a variable gave always a wrong value. I could solve this issue the following way:
unsigned int iSize = (unsigned int)(&_binary_foo_bar_end - &_binary_foo_bar_start)
It is a workaround, but it works good and is not too ugly.
Related Topics
Using Awk to Print All Columns from the Nth to the Last
Static Link of Shared Library Function in Gcc
Better Way to Rename Files Based on Multiple Patterns
What's the Difference Between Nohup and Ampersand
How to Get the Total Cpu Usage of an Application from /Proc/Pid/Stat
Have Bash Script Answer Interactive Prompts
What Does $@ Mean in a Shell Script
Fastest Way to Find Lines of a File from Another Larger File in Bash
Peak Memory Usage of a Linux/Unix Process
Retaining File Permissions With Git
How to Print a Number in Assembly Nasm
Simulate Delayed and Dropped Packets on Linux
How to Prevent a Background Process from Being Stopped After Closing Ssh Client in Linux
How to Merge Two "Ar" Static Libraries into One
Format and Then Convert Txt to CSV Using Shell Script and Awk