How to compile this asm code under linux with nasm and gcc?
If you want to continue using that old book to learn the basics (which is just fine, nothing wrong with learning the basics/old way before moving on to modern OS), you can run it in DOSBox, or a FreeDOS VM.
How to generate assembly code with gcc that can be compiled with nasm
The difficulty I think you hit with the entry point error was attempting to use ld
on an object file containing the entry point named main
while ld
was looking for an entry point named _start
.
There are a couple of considerations. First, if you are linking with the C library for the use of functions like printf
, linking will expect main
as the entry point, but if you are not linking with the C library, ld
will expect _start
. Your script is very close, but you will need some way to differentiate which entry point you need to fully automate the process for any source file.
For example, the following is a conversion using your approach of a source file including printf
. It was converted to nasm
using objconv
as follows:
Generate the object file:
gcc -fno-asynchronous-unwind-tables -s -c struct_offsetof.c -o s3.obj
Convert with objconv to nasm format assembly file
objconv -fnasm s3.obj
(note: my version of objconv
added DOS line endings -- probably an option missed, I just ran it through dos2unix
)
Using a slightly modified version of your sed
call, tweak the contents:
sed -i -e 's/align=1//g' -e 's/[a-z]*execute//g' -e \
's/: *function//g' -e '/default *rel/d' s3.asm
(note: if no standard library functions, and using ld
, change main
to _start
by adding the following expressions to your sed
call)
-e 's/^main/_start/' -e 's/[ ]main[ ]*.*$/ _start/'
(there are probably more elegant expressions for this, this was just for example)
Compile with nasm
(replacing original object file):
nasm -felf64 -o s3.obj s3.asm
Using gcc
for link:
gcc -o s3 s3.obj
Test
$ ./s3
sizeof test : 40
myint : 0 0
mychar : 4 4
myptr : 8 8
myarr : 16 16
myuint : 32 32
Commands to compile ASM file with C program
I resolved the problem by using your @Jester solution :
gcc -no-pie -o executable main.o hello.o
and thanks Ped7g for explanation.
How to generate a nasm compilable assembly code from c source code on Linux?
I find it's a better approach to disassemble the object files rather than use assembly code generated by gcc.
First, generate an object file from your source code:
gcc -fno-asynchronous-unwind-tables -O2 -s -c -o main.o main.c
-fno-asynchronous-unwind-tables
: do not generate unnecessary sections like.eh_frame
-O2
optimizes so the asm isn't horrible. Optionally use-Os
(size over speed) or-O3
(full optimization including auto-vectorization). Also you can tune for a CPU and and use extensions it supports with-march=native
or-march=haswell
or-march=znver1
(Zen)-s
: make smaller executable (strip)-c -o main.o
: compile but don't link, generate an object file calledmain.o
Use objconv to generate
nasm
code:objconv -fnasm main.o
The result will be stored in
main.asm
.The result will be very close to Nasm syntax. However you might need to make some minor tweaks to eliminiate warnings/errors. Simply try to compile it with Nasm
nasm -f elf32 main.asm
and fix the errors/warnings by hand. For example:
- remove the
align=N
andexecute
/noexecute
words from.SECTION
lines. - remove the text
: function
fromglobal
declarations - remove the
default rel
line - remove empty sections if you wish etc
- remove the
Link the resulting
main.o
which generated by Nasm in step 3 using gcc:gcc main.o
You can also link it using ld but it's much harder.
How do I compile the asm generated by GCC?
Yes, You can use gcc to compile your asm code. Use -c for compilation like this:
gcc -c file.S -o file.o
This will give object code file named file.o.
To invoke linker perform following after above command:
gcc file.o -o file
how to run this assembly code on nasm?
There is no dependency to get from nibbles.S
to nibbles.o
. Also, helpers.o
and workaround.o
don't have associated source file relations. Add those relationship and it should work.
Compile an asm bootloader with external c code
Compiling & Linking NASM and GCC Code
This question has a more complex answer than one might believe, although it is possible. Can the first stage of a bootloader (the original 512 bytes that get loaded at physical address 0x07c00) make a call into a C function? Yes, but it requires rethinking how you build your project.
For this to work you can no longer us -f bin
with NASM. This also means you can't use the org 0x7c00
to tell the assembler what address the code expects to start from. You'll need to do this through a linker (either us LD directly or GCC for linking). Since the linker will lay things out in memory we can't rely on placing the boot sector signature 0xaa55
in our output file. We can get the linker to do that for us.
The first problem you will discover is that the default linker scripts used internally by GCC don't lay things out the way we want. We'll need to create our own. Such a linker script will have to set the origin point (Virtual Memory Address aka VMA) to 0x7c00, place the code from your assembly file before the data and place the boot signature at offset 510 in the file. I'm not going to write a tutorial on Linker scripts. The Binutils Documentation contains almost everything you need to know about linker scripts.
OUTPUT_FORMAT("elf32-i386");
/* We define an entry point to keep the linker quiet. This entry point
* has no meaning with a bootloader in the binary image we will eventually
* generate. Bootloader will start executing at whatever is at 0x07c00 */
ENTRY(start);
SECTIONS
{
. = 0x7C00;
.text : {
/* Place the code in hw.o before all other code */
hw.o(.text);
*(.text);
}
/* Place the data after the code */
.data : SUBALIGN(2) {
*(.data);
*(.rodata*);
}
/* Place the boot signature at LMA/VMA 0x7DFE */
.sig 0x7DFE : {
SHORT(0xaa55);
}
/* Place the uninitialised data in the area after our bootloader
* The BIOS only reads the 512 bytes before this into memory */
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss)
. = ALIGN(4);
__bss_end = .;
}
__bss_sizeb = SIZEOF(.bss);
/* Remove sections that won't be relevant to us */
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
This script should create an ELF executable that can be converted to a flat binary file with OBJCOPY. We could have output as a binary file directly but I separate the two processes out in the event I want to include debug information in the ELF version for debug purposes.
Now that we have a linker script we must remove the ORG 0x7c00
and the boot signature. For simplicity sake we'll try to get the following code (hw.asm
) to work:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00
call print_str ; call function
/* Halt the processor so we don't keep executing code beyond this point */
cli
hlt
You can include all your other code, but this sample will still demonstrate the basics of calling into a C function.
Assume the code above you can now generate the ELF object from hw.asm
producing hw.o
using this command:
nasm -f elf32 hw.asm -o hw.o
You compile each C file with something like:
gcc -ffreestanding -c kmain.c -o kmain.o
I placed the C code you had into a file called kmain.c
. The command above will generate kmain.o
. I noticed you aren't using a cross compiler so you'll want to use -fno-PIE
to ensure we don't generate relocatable code. -ffreestanding
tells GCC the C standard library may not exist, and main
may not be the program entry point. You'd compile each C file in the same way.
To link this code to a final executable and then produce a flat binary file that can be booted we do this:
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
You specify all the object files to link with the LD command. The LD command above will produce a 32-bit ELF executable called kernel.elf
. This file can be useful in the future for debugging purposes. Here we use OBJCOPY to convert kernel.elf
to a binary file called kernel.bin
. kernel.bin
can be used as a bootloader image.
You should be able to run it with QEMU using this command:
qemu-system-i386 -fda kernel.bin
When run it may look like:
You'll notice the letter A
appears on the last line. This is what we'd expect from the print_str
code.
GCC Inline Assembly is Hard to Get Right
If we take your example code in the question:
__asm__ __volatile__("mov $'A' , %al\n");
__asm__ __volatile__("mov $0x0e, %ah\n");
__asm__ __volatile__("int $0x10\n");
The compiler is free to reorder these __asm__
statements if it wanted to. The int $0x10
could appear before the MOV instructions. If you want these 3 lines to be output in this exact order you can combine them into one like this:
__asm__ __volatile__("mov $'A' , %al\n\t"
"mov $0x0e, %ah\n\t"
"int $0x10");
These are basic assembly statements. It's not required to specify __volatile__
on them as they are already implicitly volatile, so it has no effect. From the original poster's answer it is clear they want to eventually use variables in __asm__
blocks. This is doable with extended inline assembly (the instruction string is followed by a colon :
followed by constraints.):
With extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels. Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
asm [volatile] ( AssemblerTemplate
: OutputOperands
[ : InputOperands
[ : Clobbers ] ])
This answer isn't a tutorial on inline assembly. The general rule of thumb is that one should not use inline assembly unless you have to. Inline assembly done wrong can create hard to track bugs or have unusual side effects. Unfortunately doing 16-bit interrupts in C pretty much requires it, or you write the entire function in assembly (ie: NASM).
This is an example of a print_chr
function that take a nul terminated string and prints each character out one by one using Int 10h/ah=0ah:
#include <stdint.h>
__asm__(".code16gcc\n");
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
hw.asm
would be modified to look like this:
push welcome
call print_str ;call function
The idea when this is assembled/compiled (using the commands in the first section of this answer) and run is that it print out the welcome
message. Unfortunately it will almost never work, and may even crash some emulators like QEMU.
code16 is Almost Useless and Should Not be Used
In the last section we learn that a simple function that takes a parameter ends up not working and may even crash an emulator like QEMU. The main problem is that the __asm__(".code16\n");
statement really doesn't work well with the code generated by GCC. The Binutils AS documentation says:
‘.code16gcc’ provides experimental support for generating 16-bit code from gcc, and differs from ‘.code16’ in that ‘call’, ‘ret’, ‘enter’, ‘leave’, ‘push’, ‘pop’, ‘pusha’, ‘popa’, ‘pushf’, and ‘popf’ instructions default to 32-bit size. This is so that the stack pointer is manipulated in the same way over function calls, allowing access to function parameters at the same stack offsets as in 32-bit mode. ‘.code16gcc’ also automatically adds address size prefixes where necessary to use the 32-bit addressing modes that gcc generates.
.code16gcc
is what you really need to be using, not .code16
. This force GNU assembler on the back end to emit address and operand prefixes on certain instructions so that the addresses and operands are treated as 4 bytes wide, and not 2 bytes.
The hand written code in NASM doesn't know it will be calling C instructions, nor does NASM have a directive like .code16gcc
. You'll need to modify the assembly code to push 32-bit values on to the stack in real mode. You will also need to override the call
instruction so that the return address needs to be treated as a 32-bit value, not 16-bit. This code:
push welcome
call print_str ;call function
Should be:
jmp 0x0000:setcs
setcs:
cld
push dword welcome
call dword print_str ;call function
GCC has a requirement that the direction flag be cleared before calling any C function. I added the CLD instruction to the top of the assembly code to make sure this is the case. GCC code also needs to have CS to 0x0000 to work properly. The FAR JMP does just that.
You can also drop the __asm__(".code16gcc\n");
on modern GCC that supports the -m16
option. -m16
automatically places a .code16gcc
into the file that is being compiled.
Since GCC also uses the full 32-bit stack pointer it is a good idea to initialize ESP with 0x7c00, not just SP. Change mov sp, 0x7C00
to mov esp, 0x7C00
. This ensures the full 32-bit stack pointer is 0x7c00.
The modified kmain.c
code should now look like:
#include <stdint.h>
void print_str(char *str) {
while (*str) {
/* AH=0x0e, AL=char to print, BH=page, BL=fg color */
__asm__ __volatile__ ("int $0x10"
:
: "a" ((0x0e<<8) | *str++),
"b" (0x0000));
}
}
and hw.asm
:
extern print_str
global start
bits 16
section .text
start:
xor ax, ax ; AX = 0
mov ds, ax
mov es, ax
mov ss, ax
mov esp, 0x7C00
jmp 0x0000:setcs ; Set CS to 0
setcs:
cld ; GCC code requires direction flag to be cleared
push dword welcome
call dword print_str ; call function
cli
hlt
section .data
welcome db 'Developped by Marius Van Nieuwenhuyse', 0x0D, 0x0A, 0
These commands can be build the bootloader with:
gcc -fno-PIC -ffreestanding -m16 -c kmain.c -o kmain.o
ld -melf_i386 --build-id=none -T link.ld kmain.o hw.o -o kernel.elf
objcopy -O binary kernel.elf kernel.bin
When run with qemu-system-i386 -fda kernel.bin
it should look simialr to:
In Most Cases GCC Produces Code that Requires 80386+
There are number of disadvantages to GCC generated code using .code16gcc
:
- ES=DS=CS=SS must be 0
- Code must fit in the first 64kb
- GCC code has no understanding of 20-bit segment:offset addressing.
- For anything but the most trivial C code, GCC doesn't generate code that can run on a 286/186/8086. It runs in real mode but it uses 32-bit operands and addressing not available on processors earlier than 80386.
- If you want to access memory locations above the first 64kb then you need to be in Unreal Mode(big) before calling into C code.
If you want to produce real 16-bit code from a more modern C compiler I recommend OpenWatcom C
- The inline assembly is not as powerful as GCC
- The inline assembly syntax is different but it is easier to use and less error prone than GCC's inline assembly.
- Can generate code that will run on antiquated 8086/8088 processors.
- Understands 20-bit segment:offset real mode addressing and supports the concept of far and huge pointers.
wlink
the Watcom linker can produce basic flat binary files usable as a bootloader.
Zero Fill the BSS Section
The BIOS boot sequence doesn't guarantee that memory is actually zero. This causes a potential problem for the zero initialized region BSS. Before calling into C code for the first time the region should be zero filled by our assembly code. The linker script I originally wrote defines a symbol __bss_start
that is the offset of the BSS memory and __bss_sizeb
is the size in bytes. Using this info you can use the STOSB instruction to easily zero fill it. At the top of hw.asm
you can add:
extern __bss_sizeb
extern __bss_start
And after the CLD instruction and before calling any C code you can do the zero fill this way:
; Zero fill the BSS section
mov cx, __bss_sizeb ; Size of BSS computed in linker script
mov di, __bss_start ; Start of BSS defined in linker script
rep stosb ; AL still zero, Fill memory with zero
Other Suggestions
To reduce the bloat of the code generated by the compiler it can be useful to use -fomit-frame-pointer
. Compiling with -Os
can optimize for space (rather than speed). We have limited space (512 bytes) for the initial code loaded by the BIOS so these optimizations can be beneficial. The command line for compiling could appear as:
gcc -fno-PIC -fomit-frame-pointer -ffreestanding -m16 -Os -c kmain.c -o kmain.o
Related Topics
Why Ln -Sf Does Not Overwrite Existing Link to Directory
How to Not Emit Local Symbols in Nasm So That Gdb Disas Won't Stop at Them
How Can a Program Detect If It Is Running as a Systemd Daemon
Perf_Event_Open Always Returns -1
Why This Bash Function Prints Only First Word of Whole String
Removing of Specific Line in Text File
How to Properly Quote This Bash Pipeline for Watch
How to Start a Process That Won't End When My Ssh Session Ends
How to Hide Password from Jenkins Shell Output
Displaying or Redirecting a Shell's Job Control Messages
Are the 'Dot' and 'Dot Dot' Files in Unix and Linux Real Files
Installing Gnu Parallel Without Root Permission
Why Doesn't Linux Accept() Return Eintr
Cmsg_Nxthdr() Returns Null Even Though There Are More Cmsghdr Objects
How to Suspend and Resume a Sequence of Commands in Bash
Ssh Times Out While Connecting via Ipv6 But Works with Ipv4
How to Get Docker Commands to Run in the Background with Nohup
Create a Sudo User in Script with No Prompt for Password, Change to User Without Interrupting Script