What Are the Return Values of System Calls in Assembly

What are the return values of system calls in Assembly?

See also this excellent LWN article about system calls which assumes C knowledge.

Also: The Definitive Guide to Linux System Calls (on x86), and related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?

C is the language of Unix systems programming, so all the documentation is in terms of C. And then there's documentation for the minor differences between the C interface and the asm on any given platform, usually in the Notes section of man pages.

sys_read means the raw system call (as opposed to the libc wrapper function). The kernel implementation of the read system call is a kernel function called sys_read(). You can't call it with a call instruction, because it's in the kernel, not a library. But people still talk about "calling sys_read" to distinguish it from the libc function call. However, it's ok to say read even when you mean the raw system call (especially when the libc wrapper doesn't do anything special), like I do in this answer.

Also note that syscall.h defines constants like SYS_read with the actual system call number, or asm/unistd.h for the Linux __NR_read names for the same constants. (The value you put in EAX before an int 0x80 or syscall instruction).

Linux system call return values (in EAX/RAX on x86) are either "normal" success, or a -errno code for error. e.g. -EFAULT if you pass an invalid pointer. This behaviour is documented in the syscalls(2) man page.

-1 to -4095 means error, anything else means success. See AOSP non-obvious syscall() implementation for more details on this -4095UL .. -1UL range, which is portable across architectures on Linux, and applies to every system call. (In the future, a different architecture could use a different value for MAX_ERRNO, but the value for existing arches like x86-64 is guaranteed to stay the same as part of Linus's don't-break-userspace policy of keeping kernel ABIs stable.)

For example, glibc's generic syscall(2) wrapper function uses this sequence: cmp rax, -4095 / jae SYSCALL_ERROR_LABEL, which is guaranteed to be future-proof for all Linux system calls.

You can use that wrapper function to make any system call, like syscall( __NR_mmap, ... ). (Or use an inline-asm wrapper header like https://github.com/linux-on-ibm-z/linux-syscall-support/blob/master/linux_syscall_support.h that has safe inline-asm for multiple ISAs, avoiding problems like missing "memory" clobbers that some other inline-asm wrappers have.)

Interesting cases include getpriority where the kernel ABI maps the -20..19 return-value range to 1..40, and libc decodes it. More details in a related answer about decoding syscall error return values.

For mmap, if you wanted you could also detect error just by checking that the return value isn't page-aligned (e.g. any non-zero bits in the low 11, for a 4k page size), if that would be more efficient than checking p > -4096ULL.

To find the actual numeric values of constants for a specific platform, you need to find the C header file where they're #defined. See my answer on a question about that for details. e.g. in asm-generic/errno-base.h / asm-generic/errno.h.

The meanings of return values for each sys call are documented in the section 2 man pages, like read(2). (sys_read is the raw system call that the glibc read() function is a very thin wrapper for.) Most man pages have a whole section for the return value. e.g.

RETURN VALUE
On success, the number of bytes read is returned (zero indicates
end of file), and the file position is advanced by this number. It
is not an error if this number is smaller than the number of bytes
requested; this may happen for example because fewer bytes are

actually available right now (maybe because we were close to end-of-

file, or because we are reading from a pipe, or from a terminal), or

because read() was interrupted by a signal. See also NOTES.
On error, -1 is returned, and errno is set appropriately. In this
case, it is left unspecified whether the file position (if any)

changes.

Note that the last paragraph describes how the glibc wrapper decodes the value and sets errno to -EAX if the raw system call's return value is negative, so errno=EFAULT and return -1 if the raw system call returned -EFAULT.

And there's a whole section listing all the possible error codes that read() is allowed to return, and what they mean specifically for read(). (POSIX standardizes most of this behaviour.)

Error handling for system calls in x86 assembly, under Linux

The legitimate return values from system calls are always either positive (signed) integers or addresses. When they are positive integers, the negative values can be used as error codes, so any negative value is an error.

So the only tricky case is when the return value is an address. It turns out that the addresses corresponding to integers in the range -4096..-1 are all in a kernel reserved page that will never be returned by the kernel -- so any bit pattern in that range will only ever be returned as an error code, and not as a valid address.

In addition, ALL addresses that correspond to negative integers in x86_64 are reserved for the kernel or invalid -- user addresses will always be in the range 0..2⁴⁷-1. So for x86_64 you need only check the sign bit (top bit) of %rax -- if it is set, there was an error.

test %rax, %rax
js   error

Fo 32-bit x86 code, this is not the case -- some valid addresses are negative numbbers. So in that case, you need to explicitly check for the error range, which is actually easiest to do with an unsigned comparison

cmpl  %eax, 0xfffff000   # unsigned 2^32 - 4096, aka signed -4096
ja    error              # -4095 .. -1 is an error, anything else is non-error

Return values in main vs _start

TL:DR: function return values and system-call arguments use separate registers because they're completely unrelated.

When you compile with gcc, it links CRT startup code that defines a _start. That _start (indirectly) calls main, and passes main's return value (which main leaves in EAX) to the exit() library function. (Which eventually makes an exit system call, after doing any necessary libc cleanup like flushing stdio buffers.)

See also Return vs Exit from main function in C - this is exactly analogous to what you're doing, except you're using _exit() which bypasses libc cleanup, instead of exit(). Syscall implementation of exit()

An int $0x80 system call takes its argument in EBX, as per the 32-bit system-call ABI (which you shouldn't be using in 64-bit code). It's not a return value from a function, it's the process exit status. See Hello, world in assembly language with Linux system calls? for more about system calls.

Note that _start is not a function; it can't return in that sense because there's no return address on the stack. You're taking a casual description like "return to the OS" and conflating that with a function's "return value". You can call exit from main if you want, but you can't ret from _start.

EAX is the return-value register for int-sized values in the function-calling convention. (The high 32 bits of RAX are ignored because main returns int. But also, $? exit status can only get the low 8 bits of the value passed to exit().)

Why am I allowed to exit main using ret?
What happens with the return value of main()?
where goes the ret instruction of the main
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains why you should use syscall, and shows some of the kernel side of what happens inside the kernel after a system call.

Print return value from a syscall in assembly

Wrote a 200+ line assembly program to do this. It prints the time stamp first and then prints the formatted date and time. All its functions follow System-V x64 calling convention.

global _start

section .rodata

strings:            ; '\n', ' ', '/', ':'
    db 0x1, 0xa, 0x1, 0x20
    db 0x1, 0x2f, 0x1, 0x3a
weekdays:           ; weekday strings
    db 0x3, 0x54, 0x68, 0x75
    db 0x3, 0x46, 0x72, 0x69
    db 0x3, 0x53, 0x61, 0x74
    db 0x3, 0x53, 0x75, 0x6e
    db 0x3, 0x4d, 0x6f, 0x6e
    db 0x3, 0x54, 0x75, 0x65
    db 0x3, 0x57, 0x65, 0x64
months:             ; length of months
    db 0x1f, 0x1c, 0x1f, 0x1e
    db 0x1f, 0x1e, 0x1f, 0x1f
    db 0x1e, 0x1f, 0x1e, 0x1f

section .text

_start:
    push rbx        ; align stack

    mov rax, 201    ; sys_time
    xor rdi, rdi
    syscall

    mov rbx, rax

    ; you may uncomment the following line and put an arbitary timestamp to test it out
    ; mov rbx, 0

    mov rdi, rbx
    call print_num  ; print unix timestamp

    mov rdi, strings
    call sys_print  ; new line

    mov rdi, rbx
    call print_time ; print formatted date

    pop rbx         ; since we are exiting, we don't need this pop actually

    mov rax, 60     ; sys_exit
    xor rdi, rdi
    syscall

leap_year:          ; rsi + (year in rdi is leap)
    mov rax, rdi
    mov rcx, 4
    xor rdx, rdx
    div rcx
    test rdx, rdx   ; return 0 if year % 4
    jnz func_leap_year_ret_0
    mov rax, rdi
    mov rcx, 100
    xor rdx, rdx
    div rcx
    test rdx, rdx   ; return 1 if year % 100
    jnz func_leap_year_ret_1
    mov rax, rdi
    mov rcx, 400
    xor rdx, rdx
    div rcx
    test rdx, rdx   ; return 0 if year % 400
    jnz func_leap_year_ret_0
func_leap_year_ret_1:
    lea rax, [rsi + 1]
    ret
func_leap_year_ret_0:
    mov rax, rsi
    ret

year_length:        ; length of year in rdi
    mov rsi, 365
    jmp leap_year

month_length:       ; length of month (year in rdi, month in rsi)
    push r15
    push r14
    push r13

    mov r14, rsi    ; back up month in r14, will be used as index
    cmp rsi, 1
    setz r15b
    movzx r13, r15b
    xor rsi, rsi
    call leap_year
    and r13, rax
    movzx rax, byte [r14 + months]
    add rax, r13

    pop r13
    pop r14
    pop r15
    ret

print_time:         ; print time_t in rdi
    push r15
    push r14
    push r13
    push r12
    mov r14, 1970   ; 1970-01-01T00:00:00Z
    xor r15, r15

    mov rcx, 60
    mov rax, rdi
    xor rdx, rdx
    div rcx
    push rdx        ; push #5
    xor rdx, rdx
    div rcx
    push rdx        ; push #6
    mov rcx, 24
    xor rdx, rdx
    div rcx
    push rdx        ; push #7, the last one
    mov r12, rax
    mov r13, rax
func_print_time_loop_1_start:
    mov rdi, r14
    call year_length
    cmp r13, rax
    jb func_print_time_loop_2_start
    sub r13, rax
    inc r14
    jmp func_print_time_loop_1_start
func_print_time_loop_2_start:
    mov rdi, r14
    mov rsi, r15
    call month_length
    cmp r13, rax
    jb func_print_time_loop_end
    sub r13, rax
    inc r15
    jmp func_print_time_loop_2_start
func_print_time_loop_end:
    ; print time
    mov rdi, [rsp]
    call print_num
    mov rdi, strings + 6
    call sys_print
    mov rdi, [rsp + 8]
    call print_num
    mov rdi, strings + 6
    call sys_print
    mov rdi, [rsp + 16]
    call print_num

    ; print " "
    mov rdi, strings + 2
    call sys_print

    ; print weekday
    mov rax, r12
    mov rcx, 7
    xor rdx, rdx
    div rcx
    lea rdi, [rdx * 4 + weekdays]
    call sys_print

    ; print " "
    mov rdi, strings + 2
    call sys_print

    ; print date
    mov rdi, r15
    inc rdi
    call print_num
    mov rdi, strings + 4
    call sys_print
    mov rdi, r13
    inc rdi
    call print_num
    mov rdi, strings + 4
    call sys_print
    mov rdi, r14
    call print_num

    ; print new line
    mov rdi, strings
    call sys_print

    add rsp, 24
    pop r12
    pop r13
    pop r14
    pop r15
    ret

print_num:          ; print number in rdi
    mov r8, rsp
    sub rsp, 24     ; 21 bytes for local storage, with extra 3 bytes to keep stack aligned
    xor r9, r9
    mov rax, rdi
    mov rcx, 10
func_print_num_loop_start:
    dec r8
    xor rdx, rdx
    div rcx
    add dl, 48
    mov [r8], dl
    inc r9b
    test rax, rax
    jnz func_print_num_loop_start
func_print_num_loop_end:
    dec r8
    mov [r8], r9b
    mov rdi, r8
    call sys_print
    add rsp, 24     ; deallocate local storage, restore rsp
    ret

sys_print:          ; print a string pointed by rdi
    movzx rdx, byte [rdi]
    lea rsi, [rdi + 1]
    mov rdi, 1      ; stdout
    mov rax, 1      ; write
    syscall
    ret

print_num function prints any number in register rdi. If you want to know how I print a number you can look at that function.

print_time is the where the date and time are calculated and printed.

Here is the output, along with output from a C program that prints formatted date & time using asctime(gmtime(time_t t))

$ ./time && ./ct
1608515228
1:47:8 Mon 12/21/2020
Unix time: 1608515228
C library returns: Mon Dec 21 01:47:08 2020

(The last two lines are from the C program)

You can also put any timestamp in line 34 to test it out.

My solution is very naive:

Figure out the total days first, it can be found using time/60/60/24 (and you get your hour/min/sec in this step).
Then figure out the year. I did this by subtracting number of days in a year, year by year. I designed a function to figure out the number of days in any year as my helper function.
Find month of year and day of month. It's almost the same with step 2. I designed a function to figure out the number of days in any month of any year as my helper function.

Edit:

Pasted the whole program to this answer, as several people asked.

For printing integer part, I used implementation from @PeterCordes here:

https://stackoverflow.com/a/46301894

When does Linux x86-64 syscall clobber %r8, %r9 and %r10?

Only 32-bit system calls (e.g. via int 0x80) in 64-bit mode step on those registers, along with R11. (What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?).

syscall properly saves/restores all regs including R8, R9, and R10, so user-space using it can assume they keep their values, except the RAX return value. (The kernel's syscall entry point even saves RCX and R11, but at that point they've already been overwritten by the syscall instruction itself with the original RIP and before-masking RFLAGS value.)

Those, with R11, are the non-legacy registers that are call-clobbered in the function-calling convention, so compiler-generated code for C functions inside the kernel naturally preserves R12-R15, even if an asm entry point didn't save them.

Currently the 64-bit int 0x80 entry point just pushes 0 for the call-clobbered R8-R11 registers in the process-state struct that it will restore from before returning to user space, instead of the original register values.

Historically, the int 0x80 entry point from 32-bit user-space didn't save/restore those registers at all. So their values were whatever compiler-generated kernel code left sitting around. This was thought to be innocent because 32-bit mode can't read those registers, until it was realized that user-space can far-jump to 64-bit mode, using the same CS value that the kernel uses for normal 64-bit user-space processes, selecting that system-wide GDT entry. So there was an actual info leak of kernel data, which was fixed by zeroing those registers.

IDK whether there used to be or still is a separate entry point from 64-bit user-space vs. 32-bit, or how they differ in struct pt_regs layout. The historical situation where int 0x80 leaked r8..r11 wouldn't have made sense for 64-bit user-space; that leak would have been obvious. So if they're unified now, they must not have been in the past.

What Are the Return Values of System Calls in Assembly