When does Linux x86-64 syscall clobber %r8, %r9 and %r10?
Only 32-bit system calls (e.g. via int 0x80
) in 64-bit mode step on those registers, along with R11. (What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?).
syscall
properly saves/restores all regs including R8, R9, and R10, so user-space using it can assume they keep their values, except the RAX return value. (The kernel's syscall entry point even saves RCX and R11, but at that point they've already been overwritten by the syscall
instruction itself with the original RIP and before-masking RFLAGS value.)
Those, with R11, are the non-legacy registers that are call-clobbered in the function-calling convention, so compiler-generated code for C functions inside the kernel naturally preserves R12-R15, even if an asm entry point didn't save them.
Currently the 64-bit int 0x80
entry point just pushes 0
for the call-clobbered R8-R11 registers in the process-state struct that it will restore from before returning to user space, instead of the original register values.
Historically, the int 0x80
entry point from 32-bit user-space didn't save/restore those registers at all. So their values were whatever compiler-generated kernel code left sitting around. This was thought to be innocent because 32-bit mode can't read those registers, until it was realized that user-space can far-jump to 64-bit mode, using the same CS value that the kernel uses for normal 64-bit user-space processes, selecting that system-wide GDT entry. So there was an actual info leak of kernel data, which was fixed by zeroing those registers.
IDK whether there used to be or still is a separate entry point from 64-bit user-space vs. 32-bit, or how they differ in struct pt_regs
layout. The historical situation where int 0x80
leaked r8..r11 wouldn't have made sense for 64-bit user-space; that leak would have been obvious. So if they're unified now, they must not have been in the past.
Why is RCX not used for passing parameters to system calls, being replaced with R10?
X86-64 system calls use syscall
instruction. This instruction saves return address to rcx
, and after that it loads rip
from IA32_LSTAR
MSR. I.e. rcx
is immediately destroyed by syscall
. This is the reason why rcx
had to be replaced for system call ABI.
This same syscall
instruction also saves rflags
into r11
, and then masks rflags
using IA32_FMASK
MSR. This is why r11
isn't saved by the kernel.
So, these changes reflect how the syscall mechanism works. This is why the kernel is forced to declare rcx
and r11
as not saved and even can't use them for parameter passing.
Reference: Intel's Instruction Set Reference, look for SYSCALL
.
Why do x86-64 Linux system calls modify RCX, and what does the value mean?
The system call return value is in rax
, as always. See What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.
Note that sys_brk
has a slightly different interface than the brk
/ sbrk
POSIX functions; see the C library/kernel differences section of the Linux brk(2)
man page. Specifically, Linux sys_brk
sets the program break; the arg and return value are both pointers. See Assembly x86 brk() call use. That answer needs upvotes because it's the only good one on that question.
The other interesting part of your question is:
I do not quite understand the value in the rcx register in this case
You're seeing the mechanics of how the syscall
/ sysret
instructions are designed to allow the kernel to resume user-space execution but still be fast.
syscall
doesn't do any loads or stores, it only modifies registers. Instead of using special registers to save a return address, it simply uses regular integer registers.
It's not a coincidence that RCX=RIP
and R11=RFLAGS
after the kernel returns to your user-space code. The only way for this not to be the case is if a ptrace
system call modified the process's saved rcx
or r11
value while it was inside the kernel. (ptrace
is the system call gdb uses). In that case, Linux would use iret
instead of sysret
to return to user space, because the slower general-case iret
can do that. (See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for some walk-through of Linux's system-call entry points. Mostly the entry points from 32-bit processes, not from syscall
in a 64-bit process, though.)
Instead of pushing a return address onto the kernel stack (like int 0x80
does), syscall
:
sets RCX=RIP, R11=RFLAGS (so it's impossible for the kernel to even see the original values of those regs before you executed
syscall
).masks
RFLAGS
with a pre-configured mask from a config register (theIA32_FMASK
MSR). This lets the kernel disable interrupts (IF) until it's doneswapgs
and settingrsp
to point to the kernel stack. Even withcli
as the first instruction at the entry point, there'd be a window of vulnerability. You also getcld
for free by masking offDF
sorep movs
/stos
go upward even if user-space had usedstd
.Fun fact: AMD's first proposed
syscall
/swapgs
design didn't mask RFLAGS, but they changed it after feedback from kernel developers on the amd64 mailing list (in ~2000, a couple years before the first silicon).jumps to the configured
syscall
entry point (setting CS:RIP =IA32_LSTAR
). The oldCS
value isn't saved anywhere, I think.It doesn't do anything else, the kernel has to use
swapgs
to get access to an info block where it saved the kernel stack pointer, becausersp
still has its value from user-space.
So the design of syscall
requires a system-call ABI that clobbers registers, and that's why the values are what they are.
Acceptability of regular usage of r10 and r11
The x86-64 System V ABI doesn't call its calling convention "cdecl". It's just the x86-64 SysV calling convention. The string "cdecl" doesn't appear in the ABI doc.
r11
is a temporary, aka call-clobbered register.
r10
is also a call-clobbered register. The ABI says "used for passing a function’s static chain pointer", but C doesn't use this and code generated by gcc and clang does freely use r10
without saving/restoring it. The ABI's table of register usage lists r10
as not preserved across function calls so a leaf function can always clobber it. (Which registers to use as temporaries when writing AMD64 SysV assembly?)
gcc does use r10
as part of its trampoline for function pointers to GNU C nested functions, for a pointer to the stack frame of the outer scope. The trampoline of machine code on the stack is a hack, but this is indeed a static chain pointer; languages with proper support for nested functions would probably have the caller aware of it (like a lambda / closure) and passing a value in r10
when using using pointer to a nested function.
Non-leaf functions do not need to pass on their incoming r10
to their children unless they're "nested functions" in a language that supports that sort of thing (not C or C++). Therefore r10
is also a pure temporary in normal circumstances.
r10
and r11
are not arg-passing registers, unlike the other call-clobbered registers, so "wrapper" functions can use them (especially r11
) without saving/restoring anything.
In a normal function, RBX, RBP, and RSP are call-preserved, along with R12..R15. All others can be clobbered without saving/restoring. (That includes xmm/ymm0..15 and zmm0..31, and the x87 stack, and the condition codes in RFLAGS).
Note that r8..15
need a REX prefix, even with 32-bit operand-size (like xor r10d, r10d
). If you have some 64-bit non-pointer integers, then sure keep them in r8..r11 because you always need a REX prefix for 64-bit operand-size any time you use those values anyway.
Smaller code-size is usually not worse, and sometimes helps with decode and uop-cache density, and L1i cache density. RAX, RCX,RDX, RSI,RDI should be your first choices for scratch regs. (And use 32-bit operand-size unless you need 64-bit. e.g. xor eax,eax
is the correct way to zero RAX. Silvermont doesn't recognize xor r10,r10
as a zeroing idiom, so use xor r10d,r10d
even though it doesn't save code size.)
If you do run out of low registers, ideally use r10
/ r11
for things that will normally be used with 64-bit operand-size (or VEX prefixes) anyway. e.g. pointers to 64-bit data or pointers to pointers. mov eax, [r10]
needs a REX prefix while mov eax, [rdi]
doesn't. But mov rax, [rdi]
and mov r8, [r10]
are the same size.
It's hard to gain much because you often need to use different values together in different combinations, like eventually using cmp eax, r10d
or whatever, but if you want to go all-out on optimizing, then think about code-size. Maybe also think about where the instruction boundaries are and how it will fit into the uop cache.
See the x86 tag wiki, and especially http://agner.org/optimize/ for tips on writing efficient code.
On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program?
If you use printf
or other libc functions, it's best to ret
from main or call exit
. (Which are equivalent; main's caller will call the libc exit
function.)
If not, if you were only making other raw system calls like write
with syscall
, it's also appropriate and consistent to exit that way, but either way, or call exit
are 100% fine in main.
If you want to work without libc at all, e.g. put your code under _start:
instead of main:
and link with ld
or gcc -static -nostdlib
, then you can't use ret
. Use mov eax, 231
(__NR_exit_group) / syscall
.
main
is a real & normal function like any other (called with a valid return address), but _start
(the process entry point) isn't. On entry to _start
, the stack holds argc
and argv
, so trying to ret
would set RIP=argc, and then code-fetch would segfault on that unmapped address. Nasm segmentation fault on RET in _start
System call vs. ret-from-main
Exiting via a system call is like calling _exit()
in C - skip atexit()
and libc cleanup, notably not flushing any buffered stdout output (line buffered on a terminal, full-buffered otherwise).
This leads to symptoms such as Using printf in assembly leads to empty output when piping, but works on the terminal (or if your output doesn't end with \n
, even on a terminal.)
main
is a function, called (indirectly) from CRT startup code. (Assuming you link your program normally, like you would a C program.) Your hand-written main works exactly like a compiler-generate C main
function would. Its caller (__libc_start_main
) really does do something like int result = main(argc, argv); exit(result);
,
e.g. call rax
(pointer passed by _start
) / mov edi, eax
/ call exit
.
So returning from main is exactly1 like calling exit
.
Syscall implementation of exit() for a comparison of the relevant C functions,
exit
vs._exit
vs.exit_group
and the underlying asm system calls.C question: What is the difference between exit and return? is primarily about
exit()
vs.return
, although there is mention of calling_exit()
directly, i.e. just making a system call. It's applicable because C main compiles to an asm main just like you'd write by hand.
Footnote 1: You can invent a hypothetical intentionally weird case where it's different. e.g. you used stack space in main
as your stdio buffer with sub rsp, 1024
/ mov rsi, rsp
/ ... / call setvbuf
. Then returning from main would involve putting RSP above that buffer, and __libc_start_main's call to exit could overwrite some of that buffer with return addresses and locals before execution reached the fflush cleanup. This mistake is more obvious in asm than C because you need leave
or mov rsp, rbp
or add rsp, 1024
or something to point RSP at your return address.
In C++, return from main runs destructors for its locals (before global/static exit stuff), exit
doesn't. But that just means the compiler makes asm that does more stuff before actually running the ret
, so it's all manual in asm, like in C.
The other difference is of course the asm / calling-convention details: exit status in EAX (return value) or EDI (first arg), and of course to ret
you have to have RSP pointing at your return address, like it was on function entry. With call exit
you don't, and you can even do a conditional tailcall of exit like jne exit
. Since it's a noreturn function, you don't really need RSP pointing at a valid return address. (RSP should be aligned by 16 before a call, though, or RSP%16 = 8 before a tailcall, matching the alignment after call pushes a return address. It's unlikely that exit / fflush cleanup will do any alignment-required stores/loads to the stack, but it's a good habit to get this right.)
(This whole footnote is about ret
vs. call exit
, not syscall
, so it's a bit of a tangent from the rest of the answer. You can also run syscall
without caring where the stack-pointer points.)
SYS_exit
vs. SYS_exit_group
raw system calls
The raw SYS_exit
system call is for exiting the current thread, like pthread_exit()
.
(eax=60 / syscall
, or eax=1 / int 0x80
).
SYS_exit_group
is for exiting the whole program, like _exit
.
(eax=231 / syscall
, or eax=252 / int 0x80
).
In a single-threaded program you can use either, but conceptually exit_group makes more sense to me if you're going to use raw system calls. glibc's _exit()
wrapper function actually uses the exit_group
system call (since glibc 2.3). See Syscall implementation of exit() for more details.
However, nearly all the hand-written asm you'll ever see uses SYS_exit
1. It's not "wrong", and SYS_exit
is perfectly acceptable for a program that didn't start more threads. Especially if you're trying to save code size with xor eax,eax
/ inc eax
(3 bytes in 32-bit mode) or push 60
/ pop rax
(3 bytes in 64-bit mode), while push 231
/pop rax
would be even larger than mov eax,231
because it doesn't fit in a signed imm8.
Note 1: (Usually actually hard-coding the number, not using __NR_
... constants from asm/unistd.h
or their SYS_
... names from sys/syscall.h
)
And historically, it's all there was. Note that in unistd_32.h, __NR_exit
has call number 1, but __NR_exit_group
= 252 wasn't added until years later when the kernel gained support for tasks that share virtual address space with their parent, aka threads started by clone(2)
. This is when SYS_exit
conceptually became "exit current thread". (But one could easily and convincingly argue that in a single-threaded program, SYS_exit
does still mean exit the whole program, because it only differs from exit_group
if there are multiple threads.)
To be honest, I've never used eax=252 / int 0x80 in anything, only ever eax=1. It's only in 64-bit code where I often use mov eax,231
instead of mov eax,60
because neither number is "simple" or memorable the way 1 is, so might as well be a cool guy and use the "modern" exit_group
way in my single-threaded toy program / experiment / microbenchmark / SO answer. :P (If I didn't enjoy tilting at windmills, I wouldn't spend so much time on assembly, especially on SO.)
And BTW, I usually use NASM for one-off experiments so it's inconvenient to use pre-defined symbolic constants for call numbers; with GCC to preprocess a .S
before running GAS you can make your code self-documenting with #include <sys/syscall.h>
so you can use mov $SYS_exit_group, %eax
(or $__NR_exit_group
), or mov eax, __NR_exit_group
with .intel_syntax noprefix
.
Don't use the 32-bit int 0x80
ABI in 64-bit code:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains what happens if you use the COMPAT_IA32_EMULATION int 0x80
ABI in 64-bit code.
It's totally fine for just exiting, as long as your kernel has that support compiled in, otherwise it will segfault just like any other random int number like int 0x7f
. (e.g. on WSL1, or people that built custom kernels and disabled that support.)
But the only reason you'd do it that way in asm would be so you could build the same source file with nasm -felf32
or nasm -felf64
. (You can't use syscall
in 32-bit code, except on some AMD CPUs which have a 32-bit version of syscall
. And the 32-bit ABI uses different call numbers anyway so this wouldn't let the same source be useful for both modes.)
Related:
- Why am I allowed to exit main using ret? (CRT startup code calls main, you're not returning directly to the kernel.)
- Nasm segmentation fault on RET in _start - you can't
ret
from_start
- Using printf in assembly leads to empty output when piping, but works on the terminal stdout buffer (not) flushing with raw system call exit
- Syscall implementation of exit()
call exit
vs.mov eax,60
/syscall
(_exit) vs.mov eax,231
/syscall
(exit_group). - Can't call C standard library function on 64-bit Linux from assembly (yasm) code - modern Linux distros config GCC in a way that
call exit
orcall puts
won't link withnasm -felf64 foo.asm
&&gcc foo.o
. - Is main() really start of a C++ program? - Ciro's answer is a deep dive into how glibc + its CRT startup code actually call main (including x86-64 asm disassembly in GDB), and shows the glibc source code for __libc_start_main.
- Linux x86 Program Start Up
or - How the heck do we get to main()? 32-bit asm, and more detail than you'll probably want until you're a lot more comfortable with asm, but if you've ever wondered why CRT runs so much code before getting to main, that covers what's happening at a level that's a couple steps up from using GDB withstarti
(stop at the process entry point, e.g. in the dynamic linker's_start
) andstepi
until you get to your own_start
ormain
. - https://stackoverflow.com/tags/x86/info lots of good links about this and everything else.
Related Topics
Crt1.O: in Function '_Start': - Undefined Reference to 'Main' in Linux
A General Linux File Permissions Question: Apache and Wordpress
What Is Global _Start in Assembly Language
What Does "Ulimit -S Unlimited" Do
How to Create a Dynamic Variable and Assign Value to It
How to Calculate CPU Utilization of a Process & All Its Child Processes in Linux
Syntax Error Near Unexpected Token 'Then'
How to Use Gdb in Eclipse for C/C++ Debugging
Awk - How to Delete First Column with Field Separator
How to Add a Line of Text to the Middle of a File Using Bash
Get Free Disk Space with Df to Just Display Free Space in Kb
Truncating a File While It's Being Used (Linux)
How to Undo Strip - I.E. Add Symbols Back to Stripped Binary
Maximum Number of Bash Arguments != Max Num Cp Arguments