What Is the Explanation of This X86 Hello World Using 32-Bit Int 0X80 Linux System Calls from _Start

What is the explanation of this x86 Hello World using 32-bit int 0x80 Linux system calls from _start?

In machine code, there are no functions. At least, the processor knows nothing about functions. The programmer can structure his code as he likes. _start is something called a symbol which is just a name for a location in your program. Symbols are used to refer to locations whose address you don't know yet. They are resolved during linking. The symbol _start is used as the entry point (cf. this answer) which is where the operating system jumps to start your program. Unless you specify the entry point by some other way, every program must contain _start. The other symbols your program uses are msg, which is resolved by the linker to the address where the string Hello, world! resides and len which is the length of msg.

The rest of the program does the following things:

  1. Set up the registers for the system call write(1, msg, len). write has system call number 4 which is stored in eax to let the operating system know you want system call 4. This system call writes data to a file. The file descriptor number supplied is 1 which stands for standard output.
  2. Perform a system call using int $0x80. This instruction interrupts your program, the operating system picks this up and performs the function whose number is stored in eax. It's like a function call that calls into the OS kernel. The calling convention is different from other functions, with args passed in registers.
  3. Set up the registers for the system call _exit(?). Its system call number is 1 which goes into eax. Sadly, the code forgets to set the argument for _exit, which should be 0 to indicate success. Instead, whatever was in ebx before is used instead, which seems to be 1.
  4. Perform a system call using int $0x80. Because _exit ends the program, it does not return. Your program ends here.

The directive db tells the assembler to place the following data into the program where we currently are. This places the string Hello, world! followed by a newline into the program so we can tell the write system call to write that string.

The line len equ $ - msg tells the assembler than len is the difference between $ (where we currently are) and msg. This is defined so we can pass to write how long the text we want to print is.

Everything after a semicolon (;) in the program is a comment ignored by the assembler.

What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

TL:DR: int 0x80 works when used correctly, as long as any pointers fit in 32 bits (stack pointers don't fit). But beware that strace decodes it wrong unless you have a very recent strace + kernel.

int 0x80 zeros r8-r11 for reasons, and preserves everything else. Use it exactly like you would in 32-bit code, with the 32-bit call numbers. (Or better, don't use it!)

Not all systems even support int 0x80: The Windows Subsystem for Linux version 1 (WSL1) is strictly 64-bit only: int 0x80 doesn't work at all. It's also possible to build Linux kernels without IA-32 emulation either. (No support for 32-bit executables, no support for 32-bit system calls). See this re: making sure your WSL is actually WSL2 (which uses an actual Linux kernel in a VM.)



The details: what's saved/restored, which parts of which regs the kernel uses

int 0x80 uses eax (not the full rax) as the system-call number, dispatching to the same table of function-pointers that 32-bit user-space int 0x80 uses. (These pointers are to sys_whatever implementations or wrappers for the native 64-bit implementation inside the kernel. System calls are really function calls across the user/kernel boundary.)

Only the low 32 bits of arg registers are passed. The upper halves of rbx-rbp are preserved, but ignored by int 0x80 system calls. Note that passing a bad pointer to a system call doesn't result in SIGSEGV; instead the system call returns -EFAULT. If you don't check error return values (with a debugger or tracing tool), it will appear to silently fail.

All registers (except eax of course) are saved/restored (including RFLAGS, and the upper 32 of integer regs), except that r8-r11 are zeroed. r12-r15 are call-preserved in the x86-64 SysV ABI's function calling convention, so the registers that get zeroed by int 0x80 in 64-bit are the call-clobbered subset of the "new" registers that AMD64 added.

This behaviour has been preserved over some internal changes to how register-saving was implemented inside the kernel, and comments in the kernel mention that it's usable from 64-bit, so this ABI is probably stable. (I.e. you can count on r8-r11 being zeroed, and everything else being preserved.)

The return value is sign-extended to fill 64-bit rax. (Linux declares 32-bit sys_ functions as returning signed long.) This means that pointer return values (like from void *mmap()) need to be zero-extended before use in 64-bit addressing modes

Unlike sysenter, it preserves the original value of cs, so it returns to user-space in the same mode that it was called in. (Using sysenter results in the kernel setting cs to $__USER32_CS, which selects a descriptor for a 32-bit code segment.)


Older strace decodes int 0x80 incorrectly for 64-bit processes. It decodes as if the process had used syscall instead of int 0x80. This can be very confusing. e.g. strace prints write(0, NULL, 12 <unfinished ... exit status 1> for eax=1 / int $0x80, which is actually _exit(ebx), not write(rdi, rsi, rdx).

I don't know the exact version where the PTRACE_GET_SYSCALL_INFO feature was added, but Linux kernel 5.5 / strace 5.5 handle it. It misleadingly says the process "runs in 32-bit mode" but does decode correctly. (Example).


int 0x80 works as long as all arguments (including pointers) fit in the low 32 of a register. This is the case for static code and data in the default code model ("small") in the x86-64 SysV ABI. (Section 3.5.1
: all symbols are known to be located in the virtual addresses in the range 0x00000000 to 0x7effffff, so you can do stuff like mov edi, hello (AT&T mov $hello, %edi) to get a pointer into a register with a 5 byte instruction).

But this is not the case for position-independent executables, which many Linux distros now configure gcc to make by default (and they enable ASLR for executables). For example, I compiled a hello.c on Arch Linux, and set a breakpoint at the start of main. The string constant passed to puts was at 0x555555554724, so a 32-bit ABI write system call would not work. (GDB disables ASLR by default, so you always see the same address from run to run, if you run from within GDB.)

Linux puts the stack near the "gap" between the upper and lower ranges of canonical addresses, i.e. with the top of the stack at 2^48-1. (Or somewhere random, with ASLR enabled). So rsp on entry to _start in a typical statically-linked executable is something like 0x7fffffffe550, depending on size of env vars and args. Truncating this pointer to esp does not point to any valid memory, so system calls with pointer inputs will typically return -EFAULT if you try to pass a truncated stack pointer. (And your program will crash if you truncate rsp to esp and then do anything with the stack, e.g. if you built 32-bit asm source as a 64-bit executable.)



How it works in the kernel:

In the Linux source code, arch/x86/entry/entry_64_compat.S defines
ENTRY(entry_INT80_compat). Both 32 and 64-bit processes use the same entry point when they execute int 0x80.

entry_64.S is defines native entry points for a 64-bit kernel, which includes interrupt / fault handlers and syscall native system calls from long mode (aka 64-bit mode) processes.

entry_64_compat.S defines system-call entry-points from compat mode into a 64-bit kernel, plus the special case of int 0x80 in a 64-bit process. (sysenter in a 64-bit process may go to that entry point as well, but it pushes $__USER32_CS, so it will always return in 32-bit mode.) There's a 32-bit version of the syscall instruction, supported on AMD CPUs, and Linux supports it too for fast 32-bit system calls from 32-bit processes.

I guess a possible use-case for int 0x80 in 64-bit mode is if you wanted to use a custom code-segment descriptor that you installed with modify_ldt. int 0x80 pushes segment registers itself for use with iret, and Linux always returns from int 0x80 system calls via iret. The 64-bit syscall entry point sets pt_regs->cs and ->ss to constants, __USER_CS and __USER_DS. (It's normal that SS and DS use the same segment descriptors. Permission differences are done with paging, not segmentation.)

entry_32.S defines entry points into a 32-bit kernel, and is not involved at all.

The int 0x80 entry point in Linux 4.12's entry_64_compat.S:

/*
* 32-bit legacy system call entry.
*
* 32-bit x86 Linux system calls traditionally used the INT $0x80
* instruction. INT $0x80 lands here.
*
* This entry point can be used by 32-bit and 64-bit programs to perform
* 32-bit system calls. Instances of INT $0x80 can be found inline in
* various programs and libraries. It is also used by the vDSO's
* __kernel_vsyscall fallback for hardware that doesn't support a faster
* entry method. Restarted 32-bit system calls also fall back to INT
* $0x80 regardless of what instruction was originally used to do the
* system call.
*
* This is considered a slow path. It is not used by most libc
* implementations on modern hardware except during process startup.
...
*/
ENTRY(entry_INT80_compat)
... (see the github URL for the full source)

The code zero-extends eax into rax, then pushes all the registers onto the kernel stack to form a struct pt_regs. This is where it will restore from when the system call returns. It's in a standard layout for saved user-space registers (for any entry point), so ptrace from other process (like gdb or strace) will read and/or write that memory if they use ptrace while this process is inside a system call. (ptrace modification of registers is one thing that makes return paths complicated for the other entry points. See comments.)

But it pushes $0 instead of r8/r9/r10/r11. (sysenter and AMD syscall32 entry points store zeros for r8-r15.)

I think this zeroing of r8-r11 is to match historical behaviour. Before the Set up full pt_regs for all compat syscalls commit, the entry point only saved the C call-clobbered registers. It dispatched directly from asm with call *ia32_sys_call_table(, %rax, 8), and those functions follow the calling convention, so they preserve rbx, rbp, rsp, and r12-r15. Zeroing r8-r11 instead of leaving them undefined was to avoid info leaks from a 64-bit kernel to 32-bit user-space (which could far jmp to a 64-bit code segment to read anything the kernel left there).

The current implementation (Linux 4.12) dispatches 32-bit-ABI system calls from C, reloading the saved ebx, ecx, etc. from pt_regs. (64-bit native system calls dispatch directly from asm, with only a mov %r10, %rcx needed to account for the small difference in calling convention between functions and syscall. Unfortunately it can't always use sysret, because CPU bugs make it unsafe with non-canonical addresses. It does try to, so the fast-path is pretty damn fast, although syscall itself still takes tens of cycles.)

Anyway, in current Linux, 32-bit syscalls (including int 0x80 from 64-bit) eventually end up indo_syscall_32_irqs_on(struct pt_regs *regs). It dispatches to a function pointer ia32_sys_call_table, with 6 zero-extended args. This maybe avoids needing a wrapper around the 64-bit native syscall function in more cases to preserve that behaviour, so more of the ia32 table entries can be the native system call implementation directly.

Linux 4.12 arch/x86/entry/common.c

if (likely(nr < IA32_NR_syscalls)) {
/*
* It's possible that a 32-bit syscall implementation
* takes a 64-bit parameter but nonetheless assumes that
* the high bits are zero. Make sure we zero-extend all
* of the args.
*/
regs->ax = ia32_sys_call_table[nr](
(unsigned int)regs->bx, (unsigned int)regs->cx,
(unsigned int)regs->dx, (unsigned int)regs->si,
(unsigned int)regs->di, (unsigned int)regs->bp);
}

syscall_return_slowpath(regs);

In older versions of Linux that dispatch 32-bit system calls from asm (like 64-bit still did until 4.151), the int80 entry point itself puts args in the right registers with mov and xchg instructions, using 32-bit registers. It even uses mov %edx,%edx to zero-extend EDX into RDX (because arg3 happen to use the same register in both conventions). code here. This code is duplicated in the sysenter and syscall32 entry points.

Footnote 1: Linux 4.15 (I think) introduced Spectre / Meltdown mitigations, and a major revamp of the entry points that made them them a trampoline for the meltdown case. It also sanitized the incoming registers to avoid user-space values other than actual args being in registers during the call (when some Spectre gadget might run), by storing them, zeroing everything, then calling to a C wrapper that reloads just the right widths of args from the struct saved on entry.

I'm planning to leave this answer describing the much simpler mechanism because the conceptually useful part here is that the kernel side of a syscall involves using EAX or RAX as an index into a table of function pointers, with other incoming register values copied going to the places where the calling convention wants args to go. i.e. syscall is just a way to make a call into the kernel, to its dispatch code.



Simple example / test program:

I wrote a simple Hello World (in NASM syntax) which sets all registers to have non-zero upper halves, then makes two write() system calls with int 0x80, one with a pointer to a string in .rodata (succeeds), the second with a pointer to the stack (fails with -EFAULT).

Then it uses the native 64-bit syscall ABI to write() the chars from the stack (64-bit pointer), and again to exit.

So all of these examples are using the ABIs correctly, except for the 2nd int 0x80 which tries to pass a 64-bit pointer and has it truncated.

If you built it as a position-independent executable, the first one would fail too. (You'd have to use a RIP-relative lea instead of mov to get the address of hello: into a register.)

I used gdb, but use whatever debugger you prefer. Use one that highlights changed registers since the last single-step. gdbgui works well for debugging asm source, but is not great for disassembly. Still, it does have a register pane that works well for integer regs at least, and it worked great on this example.

See the inline ;;; comments describing how register are changed by system calls

global _start
_start:
mov rax, 0x123456789abcdef
mov rbx, rax
mov rcx, rax
mov rdx, rax
mov rsi, rax
mov rdi, rax
mov rbp, rax
mov r8, rax
mov r9, rax
mov r10, rax
mov r11, rax
mov r12, rax
mov r13, rax
mov r14, rax
mov r15, rax

;; 32-bit ABI
mov rax, 0xffffffff00000004 ; high garbage + __NR_write (unistd_32.h)
mov rbx, 0xffffffff00000001 ; high garbage + fd=1
mov rcx, 0xffffffff00000000 + .hello
mov rdx, 0xffffffff00000000 + .hellolen
;std
after_setup: ; set a breakpoint here
int 0x80 ; write(1, hello, hellolen); 32-bit ABI
;; succeeds, writing to stdout
;;; changes to registers: r8-r11 = 0. rax=14 = return value

; ebx still = 1 = STDOUT_FILENO
push 'bye' + (0xa<<(3*8))
mov rcx, rsp ; rcx = 64-bit pointer that won't work if truncated
mov edx, 4
mov eax, 4 ; __NR_write (unistd_32.h)
int 0x80 ; write(ebx=1, ecx=truncated pointer, edx=4); 32-bit
;; fails, nothing printed
;;; changes to registers: rax=-14 = -EFAULT (from /usr/include/asm-generic/errno-base.h)

mov r10, rax ; save return value as exit status
mov r8, r15
mov r9, r15
mov r11, r15 ; make these regs non-zero again

;; 64-bit ABI
mov eax, 1 ; __NR_write (unistd_64.h)
mov edi, 1
mov rsi, rsp
mov edx, 4
syscall ; write(edi=1, rsi='bye\n' on the stack, rdx=4); 64-bit
;; succeeds: writes to stdout and returns 4 in rax
;;; changes to registers: rax=4 = length return value
;;; rcx = 0x400112 = RIP. r11 = 0x302 = eflags with an extra bit set.
;;; (This is not a coincidence, it's how sysret works. But don't depend on it, since iret could leave something else)

mov edi, r10d
;xor edi,edi
mov eax, 60 ; __NR_exit (unistd_64.h)
syscall ; _exit(edi = first int 0x80 result); 64-bit
;; succeeds, exit status = low byte of first int 0x80 result = 14

section .rodata
_start.hello: db "Hello World!", 0xa, 0
_start.hellolen equ $ - _start.hello

Build it into a 64-bit static binary with

yasm -felf64 -Worphan-labels -gdwarf2 abi32-from-64.asm
ld -o abi32-from-64 abi32-from-64.o

Run gdb ./abi32-from-64. In gdb, run set disassembly-flavor intel and layout reg if you don't have that in your ~/.gdbinit already. (GAS .intel_syntax is like MASM, not NASM, but they're close enough that it's easy to read if you like NASM syntax.)

(gdb)  set disassembly-flavor intel
(gdb) layout reg
(gdb) b after_setup
(gdb) r
(gdb) si # step instruction
press return to repeat the last command, keep stepping

Press control-L when gdb's TUI mode gets messed up. This happens easily, even when programs don't print to stdout themselves.

What does int 0x80 mean in assembly code?

It passes control to interrupt vector 0x80

See http://en.wikipedia.org/wiki/Interrupt_vector

On Linux, have a look at this: it was used to handle system_call. Of course on another OS this could mean something totally different.

Hello, world in assembly language with Linux system calls?

How does $ work in NASM, exactly? explains how $ - msg gets NASM to calculate the string length as an assemble-time constant for you, instead of hard-coding it.


I originally wrote the rest of this for SO Docs (topic ID: 1164, example ID: 19078), rewriting a basic less-well-commented example by @runner. This looks like a better place to put it than as part of my answer to another question where I had previously moved it after the SO docs experiment ended.


Making a system call is done by putting arguments into registers, then running int 0x80 (32-bit mode) or syscall (64-bit mode). What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 and The Definitive Guide to Linux System Calls.

Think of int 0x80 as a way to "call" into the kernel, across the user/kernel privilege boundary. The kernel does stuff according to the values that were in registers when int 0x80 executed, then eventually returns. The return value is in EAX.

When execution reaches the kernel's entry point, it looks at EAX and dispatches to the right system call based on the call number in EAX. Values from other registers are passed as function args to the kernel's handler for that system call. (e.g. eax=4 / int 0x80 will get the kernel to call its sys_write kernel function, implementing the POSIX write system call.)

And see also What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - that answer includes a look at the asm in the kernel entry point that is "called" by int 0x80. (Also applies to 32-bit user-space, not just 64-bit where you shouldn't use int 0x80).


If you don't already know low-level Unix systems programming, you might want to just write functions in asm that take args and return a value (or update arrays via a pointer arg) and call them from C or C++ programs. Then you can just worry about learning how to handle registers and memory, without also learning the POSIX system-call API and the ABI for using it. That also makes it very easy to compare your code with compiler output for a C implementation. Compilers usually do a pretty good job at making efficient code, but are rarely perfect.

libc provides wrapper functions for system calls, so compiler-generated code would call write rather than invoking it directly with int 0x80 (or if you care about performance, sysenter). (In x86-64 code, use syscall for the 64-bit ABI.) See also syscalls(2).

System calls are documented in section 2 manual pages, like write(2). See the NOTES section for differences between the libc wrapper function and the underlying Linux system call. Note that the wrapper for sys_exit is _exit(2), not the exit(3) ISO C function that flushes stdio buffers and other cleanup first. There's also an exit_group system call that ends all threads. exit(3) actually uses that, because there's no downside in a single-threaded process.

This code makes 2 system calls:

  • sys_write(1, "Hello, World!\n", sizeof(...));
  • sys_exit(0);

I commented it heavily (to the point where it it's starting to obscure the actual code without color syntax highlighting). This is an attempt to point things out to total beginners, not how you should comment your code normally.

section .text             ; Executable code goes in the .text section
global _start ; The linker looks for this symbol to set the process entry point, so execution start here
;;;a name followed by a colon defines a symbol. The global _start directive modifies it so it's a global symbol, not just one that we can CALL or JMP to from inside the asm.
;;; note that _start isn't really a "function". You can't return from it, and the kernel passes argc, argv, and env differently than main() would expect.
_start:
;;; write(1, msg, len);
; Start by moving the arguments into registers, where the kernel will look for them
mov edx,len ; 3rd arg goes in edx: buffer length
mov ecx,msg ; 2nd arg goes in ecx: pointer to the buffer
;Set output to stdout (goes to your terminal, or wherever you redirect or pipe)
mov ebx,1 ; 1st arg goes in ebx: Unix file descriptor. 1 = stdout, which is normally connected to the terminal.

mov eax,4 ; system call number (from SYS_write / __NR_write from unistd_32.h).
int 0x80 ; generate an interrupt, activating the kernel's system-call handling code. 64-bit code uses a different instruction, different registers, and different call numbers.
;; eax = return value, all other registers unchanged.

;;;Second, exit the process. There's nothing to return to, so we can't use a ret instruction (like we could if this was main() or any function with a caller)
;;; If we don't exit, execution continues into whatever bytes are next in the memory page,
;;; typically leading to a segmentation fault because the padding 00 00 decodes to add [eax],al.

;;; _exit(0);
xor ebx,ebx ; first arg = exit status = 0. (will be truncated to 8 bits). Zeroing registers is a special case on x86, and mov ebx,0 would be less efficient.
;; leaving out the zeroing of ebx would mean we exit(1), i.e. with an error status, since ebx still holds 1 from earlier.
mov eax,1 ; put __NR_exit into eax
int 0x80 ;Execute the Linux function

section .rodata ; Section for read-only constants

;; msg is a label, and in this context doesn't need to be msg:. It could be on a separate line.
;; db = Data Bytes: assemble some literal bytes into the output file.
msg db 'Hello, world!',0xa ; ASCII string constant plus a newline (0x10)

;; No terminating zero byte is needed, because we're using write(), which takes a buffer + length instead of an implicit-length string.
;; To make this a C string that we could pass to puts or strlen, we'd need a terminating 0 byte. (e.g. "...", 0x10, 0)

len equ $ - msg ; Define an assemble-time constant (not stored by itself in the output file, but will appear as an immediate operand in insns that use it)
; Calculate len = string length. subtract the address of the start
; of the string from the current position ($)
;; equivalently, we could have put a str_end: label after the string and done len equ str_end - str

Notice that we don't store the string length in data memory anywhere. It's an assemble-time constant, so it's more efficient to have it as an immediate operand than a load. We could also have pushed the string data onto the stack with three push imm32 instructions, but bloating the code-size too much isn't a good thing.


On Linux, you can save this file as Hello.asm and build a 32-bit executable from it with these commands:

nasm -felf32 Hello.asm                  # assemble as 32-bit code.  Add -Worphan-labels -g -Fdwarf  for debug symbols and warnings
gcc -static -nostdlib -m32 Hello.o -o Hello # link without CRT startup code or libc, making a static binary

See this answer for more details on building assembly into 32 or 64-bit static or dynamically linked Linux executables, for NASM/YASM syntax or GNU AT&T syntax with GNU as directives. (Key point: make sure to use -m32 or equivalent when building 32-bit code on a 64-bit host, or you will have confusing problems at run-time.)


You can trace its execution with strace to see the system calls it makes:

$ strace ./Hello 
execve("./Hello", ["./Hello"], [/* 72 vars */]) = 0
[ Process PID=4019 runs in 32 bit mode. ]
write(1, "Hello, world!\n", 14Hello, world!
) = 14
_exit(0) = ?
+++ exited with 0 +++

Compare this with the trace for a dynamically linked process (like gcc makes from hello.c, or from running strace /bin/ls) to get an idea just how much stuff happens under the hood for dynamic linking and C library startup.

The trace on stderr and the regular output on stdout are both going to the terminal here, so they interfere in the line with the write system call. Redirect or trace to a file if you care. Notice how this lets us easily see the syscall return values without having to add code to print them, and is actually even easier than using a regular debugger (like gdb) to single-step and look at eax for this. See the bottom of the x86 tag wiki for gdb asm tips. (The rest of the tag wiki is full of links to good resources.)

The x86-64 version of this program would be extremely similar, passing the same args to the same system calls, just in different registers and with syscall instead of int 0x80. See the bottom of What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for a working example of writing a string and exiting in 64-bit code.


related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.

step by step hello world in assembly for mac OS

  1. No, that just exports the symbol.
  2. No, that tells the assembler which section to put the following stuff into. .text is a default section for code.
  3. No, that's a label. Function entry points are usually denoted by labels, but not all labels are functions.
  4. On MacOS the value 0x2000004 is the code that specifies you want a write system call. The OS will look in rax to determine what the caller wants. All system services have a code. You can imagine the OS doing something like if (rax == 0x2000004) do_write(rdi, rsi, rdx);
  5. rdi is a register. You know the registers, right? Similarly to point #4 above, the OS once it determined you wanted a write will check rdi for the destination file descriptor.
  6. str.len is just a label syntax. The value is defined at the bottom. This should be loaded into rdx not rdi though.
  7. It transfers control to the OS. Which then look at the contents of the registers and performs the action requested. The OS is just code, albeit privileged.


Related Topics



Leave a reply



Submit