What Are the Return Values of System Calls in Assembly

What are the return values of system calls in Assembly?

See also this excellent LWN article about system calls which assumes C knowledge.

Also: The Definitive Guide to Linux System Calls (on x86), and related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?


C is the language of Unix systems programming, so all the documentation is in terms of C. And then there's documentation for the minor differences between the C interface and the asm on any given platform, usually in the Notes section of man pages.

sys_read means the raw system call (as opposed to the libc wrapper function). The kernel implementation of the read system call is a kernel function called sys_read(). You can't call it with a call instruction, because it's in the kernel, not a library. But people still talk about "calling sys_read" to distinguish it from the libc function call. However, it's ok to say read even when you mean the raw system call (especially when the libc wrapper doesn't do anything special), like I do in this answer.

Also note that syscall.h defines constants like SYS_read with the actual system call number, or asm/unistd.h for the Linux __NR_read names for the same constants. (The value you put in EAX before an int 0x80 or syscall instruction).


Linux system call return values (in EAX/RAX on x86) are either "normal" success, or a -errno code for error. e.g. -EFAULT if you pass an invalid pointer. This behaviour is documented in the syscalls(2) man page.

-1 to -4095 means error, anything else means success. See AOSP non-obvious syscall() implementation for more details on this -4095UL .. -1UL range, which is portable across architectures on Linux, and applies to every system call. (In the future, a different architecture could use a different value for MAX_ERRNO, but the value for existing arches like x86-64 is guaranteed to stay the same as part of Linus's don't-break-userspace policy of keeping kernel ABIs stable.)

For example, glibc's generic syscall(2) wrapper function uses this sequence: cmp rax, -4095 / jae SYSCALL_ERROR_LABEL, which is guaranteed to be future-proof for all Linux system calls.

You can use that wrapper function to make any system call, like syscall( __NR_mmap, ... ). (Or use an inline-asm wrapper header like https://github.com/linux-on-ibm-z/linux-syscall-support/blob/master/linux_syscall_support.h that has safe inline-asm for multiple ISAs, avoiding problems like missing "memory" clobbers that some other inline-asm wrappers have.)


Interesting cases include getpriority where the kernel ABI maps the -20..19 return-value range to 1..40, and libc decodes it. More details in a related answer about decoding syscall error return values.

For mmap, if you wanted you could also detect error just by checking that the return value isn't page-aligned (e.g. any non-zero bits in the low 11, for a 4k page size), if that would be more efficient than checking p > -4096ULL.


To find the actual numeric values of constants for a specific platform, you need to find the C header file where they're #defined. See my answer on a question about that for details. e.g. in asm-generic/errno-base.h / asm-generic/errno.h.


The meanings of return values for each sys call are documented in the section 2 man pages, like read(2). (sys_read is the raw system call that the glibc read() function is a very thin wrapper for.) Most man pages have a whole section for the return value. e.g.

RETURN VALUE

On success, the number of bytes read is returned (zero indicates
end of file), and the file position is advanced by this number. It
is not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are

actually available right now (maybe because we were close to end-of-

file, or because we are reading from a pipe, or from a terminal), or

because read() was interrupted by a signal. See also NOTES.

On error, -1 is returned, and errno is set appropriately. In this
case, it is left unspecified whether the file position (if any)

changes.

Note that the last paragraph describes how the glibc wrapper decodes the value and sets errno to -EAX if the raw system call's return value is negative, so errno=EFAULT and return -1 if the raw system call returned -EFAULT.

And there's a whole section listing all the possible error codes that read() is allowed to return, and what they mean specifically for read(). (POSIX standardizes most of this behaviour.)

Error handling for system calls in x86 assembly, under Linux

The legitimate return values from system calls are always either positive (signed) integers or addresses. When they are positive integers, the negative values can be used as error codes, so any negative value is an error.

So the only tricky case is when the return value is an address. It turns out that the addresses corresponding to integers in the range -4096..-1 are all in a kernel reserved page that will never be returned by the kernel -- so any bit pattern in that range will only ever be returned as an error code, and not as a valid address.

In addition, ALL addresses that correspond to negative integers in x86_64 are reserved for the kernel or invalid -- user addresses will always be in the range 0..247-1. So for x86_64 you need only check the sign bit (top bit) of %rax -- if it is set, there was an error.

test %rax, %rax
js error

Fo 32-bit x86 code, this is not the case -- some valid addresses are negative numbbers. So in that case, you need to explicitly check for the error range, which is actually easiest to do with an unsigned comparison

cmpl  %eax, 0xfffff000   # unsigned 2^32 - 4096, aka signed -4096
ja error # -4095 .. -1 is an error, anything else is non-error

Return values in main vs _start

TL:DR: function return values and system-call arguments use separate registers because they're completely unrelated.


When you compile with gcc, it links CRT startup code that defines a _start. That _start (indirectly) calls main, and passes main's return value (which main leaves in EAX) to the exit() library function. (Which eventually makes an exit system call, after doing any necessary libc cleanup like flushing stdio buffers.)

See also Return vs Exit from main function in C - this is exactly analogous to what you're doing, except you're using _exit() which bypasses libc cleanup, instead of exit(). Syscall implementation of exit()

An int $0x80 system call takes its argument in EBX, as per the 32-bit system-call ABI (which you shouldn't be using in 64-bit code). It's not a return value from a function, it's the process exit status. See Hello, world in assembly language with Linux system calls? for more about system calls.

Note that _start is not a function; it can't return in that sense because there's no return address on the stack. You're taking a casual description like "return to the OS" and conflating that with a function's "return value". You can call exit from main if you want, but you can't ret from _start.

EAX is the return-value register for int-sized values in the function-calling convention. (The high 32 bits of RAX are ignored because main returns int. But also, $? exit status can only get the low 8 bits of the value passed to exit().)

Related:

  • Why am I allowed to exit main using ret?
  • What happens with the return value of main()?
  • where goes the ret instruction of the main
  • What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains why you should use syscall, and shows some of the kernel side of what happens inside the kernel after a system call.

Print return value from a syscall in assembly

Wrote a 200+ line assembly program to do this. It prints the time stamp first and then prints the formatted date and time. All its functions follow System-V x64 calling convention.

global _start

section .rodata

strings: ; '\n', ' ', '/', ':'
db 0x1, 0xa, 0x1, 0x20
db 0x1, 0x2f, 0x1, 0x3a
weekdays: ; weekday strings
db 0x3, 0x54, 0x68, 0x75
db 0x3, 0x46, 0x72, 0x69
db 0x3, 0x53, 0x61, 0x74
db 0x3, 0x53, 0x75, 0x6e
db 0x3, 0x4d, 0x6f, 0x6e
db 0x3, 0x54, 0x75, 0x65
db 0x3, 0x57, 0x65, 0x64
months: ; length of months
db 0x1f, 0x1c, 0x1f, 0x1e
db 0x1f, 0x1e, 0x1f, 0x1f
db 0x1e, 0x1f, 0x1e, 0x1f

section .text

_start:
push rbx ; align stack

mov rax, 201 ; sys_time
xor rdi, rdi
syscall

mov rbx, rax

; you may uncomment the following line and put an arbitary timestamp to test it out
; mov rbx, 0

mov rdi, rbx
call print_num ; print unix timestamp

mov rdi, strings
call sys_print ; new line

mov rdi, rbx
call print_time ; print formatted date

pop rbx ; since we are exiting, we don't need this pop actually

mov rax, 60 ; sys_exit
xor rdi, rdi
syscall

leap_year: ; rsi + (year in rdi is leap)
mov rax, rdi
mov rcx, 4
xor rdx, rdx
div rcx
test rdx, rdx ; return 0 if year % 4
jnz func_leap_year_ret_0
mov rax, rdi
mov rcx, 100
xor rdx, rdx
div rcx
test rdx, rdx ; return 1 if year % 100
jnz func_leap_year_ret_1
mov rax, rdi
mov rcx, 400
xor rdx, rdx
div rcx
test rdx, rdx ; return 0 if year % 400
jnz func_leap_year_ret_0
func_leap_year_ret_1:
lea rax, [rsi + 1]
ret
func_leap_year_ret_0:
mov rax, rsi
ret

year_length: ; length of year in rdi
mov rsi, 365
jmp leap_year

month_length: ; length of month (year in rdi, month in rsi)
push r15
push r14
push r13

mov r14, rsi ; back up month in r14, will be used as index
cmp rsi, 1
setz r15b
movzx r13, r15b
xor rsi, rsi
call leap_year
and r13, rax
movzx rax, byte [r14 + months]
add rax, r13

pop r13
pop r14
pop r15
ret

print_time: ; print time_t in rdi
push r15
push r14
push r13
push r12
mov r14, 1970 ; 1970-01-01T00:00:00Z
xor r15, r15

mov rcx, 60
mov rax, rdi
xor rdx, rdx
div rcx
push rdx ; push #5
xor rdx, rdx
div rcx
push rdx ; push #6
mov rcx, 24
xor rdx, rdx
div rcx
push rdx ; push #7, the last one
mov r12, rax
mov r13, rax
func_print_time_loop_1_start:
mov rdi, r14
call year_length
cmp r13, rax
jb func_print_time_loop_2_start
sub r13, rax
inc r14
jmp func_print_time_loop_1_start
func_print_time_loop_2_start:
mov rdi, r14
mov rsi, r15
call month_length
cmp r13, rax
jb func_print_time_loop_end
sub r13, rax
inc r15
jmp func_print_time_loop_2_start
func_print_time_loop_end:
; print time
mov rdi, [rsp]
call print_num
mov rdi, strings + 6
call sys_print
mov rdi, [rsp + 8]
call print_num
mov rdi, strings + 6
call sys_print
mov rdi, [rsp + 16]
call print_num

; print " "
mov rdi, strings + 2
call sys_print

; print weekday
mov rax, r12
mov rcx, 7
xor rdx, rdx
div rcx
lea rdi, [rdx * 4 + weekdays]
call sys_print

; print " "
mov rdi, strings + 2
call sys_print

; print date
mov rdi, r15
inc rdi
call print_num
mov rdi, strings + 4
call sys_print
mov rdi, r13
inc rdi
call print_num
mov rdi, strings + 4
call sys_print
mov rdi, r14
call print_num

; print new line
mov rdi, strings
call sys_print

add rsp, 24
pop r12
pop r13
pop r14
pop r15
ret

print_num: ; print number in rdi
mov r8, rsp
sub rsp, 24 ; 21 bytes for local storage, with extra 3 bytes to keep stack aligned
xor r9, r9
mov rax, rdi
mov rcx, 10
func_print_num_loop_start:
dec r8
xor rdx, rdx
div rcx
add dl, 48
mov [r8], dl
inc r9b
test rax, rax
jnz func_print_num_loop_start
func_print_num_loop_end:
dec r8
mov [r8], r9b
mov rdi, r8
call sys_print
add rsp, 24 ; deallocate local storage, restore rsp
ret

sys_print: ; print a string pointed by rdi
movzx rdx, byte [rdi]
lea rsi, [rdi + 1]
mov rdi, 1 ; stdout
mov rax, 1 ; write
syscall
ret

print_num function prints any number in register rdi. If you want to know how I print a number you can look at that function.

print_time is the where the date and time are calculated and printed.

Here is the output, along with output from a C program that prints formatted date & time using asctime(gmtime(time_t t))

$ ./time && ./ct
1608515228
1:47:8 Mon 12/21/2020
Unix time: 1608515228
C library returns: Mon Dec 21 01:47:08 2020

(The last two lines are from the C program)

You can also put any timestamp in line 34 to test it out.

My solution is very naive:

  • Figure out the total days first, it can be found using time/60/60/24 (and you get your hour/min/sec in this step).
  • Then figure out the year. I did this by subtracting number of days in a year, year by year. I designed a function to figure out the number of days in any year as my helper function.
  • Find month of year and day of month. It's almost the same with step 2. I designed a function to figure out the number of days in any month of any year as my helper function.

Edit:

Pasted the whole program to this answer, as several people asked.

For printing integer part, I used implementation from @PeterCordes here:

https://stackoverflow.com/a/46301894

When does Linux x86-64 syscall clobber %r8, %r9 and %r10?

Only 32-bit system calls (e.g. via int 0x80) in 64-bit mode step on those registers, along with R11. (What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?).

syscall properly saves/restores all regs including R8, R9, and R10, so user-space using it can assume they keep their values, except the RAX return value. (The kernel's syscall entry point even saves RCX and R11, but at that point they've already been overwritten by the syscall instruction itself with the original RIP and before-masking RFLAGS value.)


Those, with R11, are the non-legacy registers that are call-clobbered in the function-calling convention, so compiler-generated code for C functions inside the kernel naturally preserves R12-R15, even if an asm entry point didn't save them.

Currently the 64-bit int 0x80 entry point just pushes 0 for the call-clobbered R8-R11 registers in the process-state struct that it will restore from before returning to user space, instead of the original register values.

Historically, the int 0x80 entry point from 32-bit user-space didn't save/restore those registers at all. So their values were whatever compiler-generated kernel code left sitting around. This was thought to be innocent because 32-bit mode can't read those registers, until it was realized that user-space can far-jump to 64-bit mode, using the same CS value that the kernel uses for normal 64-bit user-space processes, selecting that system-wide GDT entry. So there was an actual info leak of kernel data, which was fixed by zeroing those registers.

IDK whether there used to be or still is a separate entry point from 64-bit user-space vs. 32-bit, or how they differ in struct pt_regs layout. The historical situation where int 0x80 leaked r8..r11 wouldn't have made sense for 64-bit user-space; that leak would have been obvious. So if they're unified now, they must not have been in the past.



Related Topics



Leave a reply



Submit