What Registers Are Preserved Through a Linux X86-64 Function Call

What registers are preserved through a linux x86-64 function call

Here's the complete table of registers and their use from the documentation [PDF Link]:

table from docs

r12, r13, r14, r15, rbx, rsp, rbp are the callee-saved registers - they have a "Yes" in the "Preserved across function calls" column.

What does it mean that registers are preserved across function calls?

Correct, your hand-written asm get_array_size doesn't follow the ABI because it's clobbering call-preserved registers. That means its callers need to treat it specially, not follow the usual ABI guarantees.

The ABI doc is a standard that compiler-generated functions follow, and so should most hand-written functions unless you want to make up your own custom calling convention. See What are callee and caller saved registers? for more details about what call-preserved vs. call-clobbered means for the caller, and for the implementation of the function itself if it wants to follow the ABI.

Small private "helper" functions with custom calling conventions are fine as long as you comment them (and never try to call them from C). Especially when you're optimizing, e.g. for code size (see codegolf x86-64 tips)

There is no magic in asm, each instruction only has its documented effect on the architectural state. (Contents of registers and memory).

As you can see from Intel's docs for call and ret, the only integer register they modify is RSP. Normal assemblers like NASM and GAS don't magically add instructions to your function. (MASM can be different, but if you look at disassembly you can still see the real code.)

What registers must be preserved by an x86 function?

Using Microsoft's 32 bit ABI (cdecl or stdcall or other calling conventions), EAX, EDX and ECX are scratch registers (call clobbered). The other general-purpose integer registers are call-preserved.

The condition codes in EFLAGS are call-clobbered. DF=0 is required on call/return so you can use rep movsb without a cld first. The x87 stack must be empty on call, or on return from a function that doesn't return an FP value. (FP return values go in st0, with the x87 stack empty other than that.) XMM6 and 7 are call-preserved, the rest are call-clobbered scratch registers.

Outside of Windows, most 32-bit calling conventions (including i386 System V on Linux) agree with this choice of EAX, EDX and ECX as call-clobbered, but all the xmm registers are call-clobbered.

For x64 under Windows, you only need to restore RBX, RBP, RDI, RSI, R12, R13, R14, and R15. XMM6..15 are call-preserved. (And you have to reserve 32 bytes of shadow space for use by the callee, whether or not there are any args that don't fit in registers.) xmm6..15 are call-preserved.

See https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention for more details.

Other OSes use the x86-64 System V ABI (see figure 3.4), where the call-preserved integer registers are RBP, RBX, RSP, R12, R13, R14, and R15. All the XMM/YMM/ZMM registers are call-clobbered.

EFLAGS and the x87 stack are the same as in 32-bit conventions: DF=0, condition flags are clobbered, and x87 stack is empty. (x86-64 conventions return FP values in XMM0, so the x87 stack registers always need to be empty on call/return.)

For links to official calling convention docs, see https://stackoverflow.com/tags/x86/info

x64 argument and return value calling convention

I'm afraid you are mistaken:

the function argument is passed in rdi, as per the x86-64 System V calling convention.
register rbx must not be modified by a function; GCC saves/restores it as required, so it can keep a copy of x there across the call to bar.
the function return value is in rax. (Actually eax; a 32-bit int only uses the low half)

You can verify the basics by compiling a function like int foo(int x){return x;} - you'll see just a mov eax, edi.

Here is a commented version:

foo:                                    # @foo
        push    rbx           # save register rbx
        mov     ebx, edi      # save argument `x` in ebx
        call    bar           # a = bar()  (in eax)
        or      eax, ebx      # compute `x | a`, setting FLAGS
        mov     ecx, 456      # prepare 456 for conditional move
        mov     eax, 123      # eax = 123
        cmove   eax, ecx      # if `(x | a) == 0` set eax to 456
        pop     rbx           # restore register rbx
        ret                   # return value is in eax

The compiler optimizes x || b as (x | b) != 0 which allows for branchless code generation.

Note that mov doesn't modify the FLAGS, unlike most integer ALU instructions.

Are rdi and rsi caller saved or callee saved registers?

Yes, in all function-calling conventions I'm aware of, the arg-passing registers are call-clobbered. (Except for system-call calling conventions, where normally all regs are preserved except a return value, including arg-passing. Except that x86-64 syscall destroys RCX and R11...)

Specifically in x86-64 System V, all registers other than RBX, RBP, RSP, and R12-R15 are call-clobbered. (That includes xmm0-15, x87/mmx registers, and AVX512 zmm0-31 and k0-k7 mask regs.)

What registers are preserved through a linux x86-64 function call shows the table from the ABI doc.

The calling convention / ABI defines the status of registers as call-preserved or call-clobbered. Different conventions can make different choices.

And yes, Microsoft Windows chose a different calling convention from everyone else: Why does Windows64 use a different calling convention from all other OSes on x86-64? In Windows x64, RDI is call-preserved, like in most 32-bit calling conventions.

But in x86-64 System V, the designers chose registers from scratch, and (as my answer on that linked question shows) found that using RDI and RSI for the first 2 args saved instructions (when building SPECint with an early x86-64 port of gcc). Probably because gcc at the time liked to inline memset or memcpy using rep stosd, or the library implementation used that.

(It makes no sense to say that RDI is intrinsically call-clobbered, the x86-64 ISA doesn't define that. It's up to each platform to choose that.)

Terminology:

I hate the "caller saved" vs. "callee saved" terminology: It's confusing to think from 2 different perspectives (caller and callee), and wrongly implies that every register does get saved somewhere on every call. Also, the names only differ by 1 letter, so aren't very visually distinct when reading.

"preserved" or "clobbered" are great; they work from either perspective. (What a callee will do to your regs, or what you're allowed to do to the caller's regs.) Moreover, they're self-explanatory.

What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Further reading for any of the topics here: The Definitive Guide to Linux System Calls

I verified these using GNU Assembler (gas) on Linux.

Kernel Interface

x86-32 aka i386 Linux System Call convention:

In x86-32 parameters for Linux system call are passed using registers. %eax for syscall_number. %ebx, %ecx, %edx, %esi, %edi, %ebp are used for passing 6 parameters to system calls.

The return value is in %eax. All other registers (including EFLAGS) are preserved across the int $0x80.

I took following snippet from the Linux Assembly Tutorial but I'm doubtful about this. If any one can show an example, it would be great.

If there are more than six arguments,
%ebx must contain the memory
location where the list of arguments
is stored - but don't worry about this
because it's unlikely that you'll use
a syscall with more than six
arguments.

For an example and a little more reading, refer to http://www.int80h.org/bsdasm/#alternate-calling-convention. Another example of a Hello World for i386 Linux using int 0x80: Hello, world in assembly language with Linux system calls?

There is a faster way to make 32-bit system calls: using sysenter. The kernel maps a page of memory into every process (the vDSO), with the user-space side of the sysenter dance, which has to cooperate with the kernel for it to be able to find the return address. Arg to register mapping is the same as for int $0x80. You should normally call into the vDSO instead of using sysenter directly. (See The Definitive Guide to Linux System Calls for info on linking and calling into the vDSO, and for more info on sysenter, and everything else to do with system calls.)

x86-32 [Free|Open|Net|DragonFly]BSD UNIX System Call convention:

Parameters are passed on the stack. Push the parameters (last parameter pushed first) on to the stack. Then push an additional 32-bit of dummy data (Its not actually dummy data. refer to following link for more info) and then give a system call instruction int $0x80

http://www.int80h.org/bsdasm/#default-calling-convention

x86-64 Linux System Call convention:

(Note: x86-64 Mac OS X is similar but different from Linux. TODO: check what *BSD does)

Refer to section: "A.2 AMD64 Linux Kernel Conventions" of System V Application Binary Interface AMD64 Architecture Processor Supplement. The latest versions of the i386 and x86-64 System V psABIs can be found linked from this page in the ABI maintainer's repo. (See also the x86 tag wiki for up-to-date ABI links and lots of other good stuff about x86 asm.)

Here is the snippet from this section:

User-level applications use as integer registers for passing the
sequence %rdi, %rsi, %rdx, %rcx,
%r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.
A system-call is done via the syscall instruction. This clobbers %rcx and %r11 as well as the %rax return value, but other registers are preserved.
The number of the syscall has to be passed in register %rax.
System-calls are limited to six arguments, no argument is passed
directly on the stack.
Returning from the syscall, register %rax contains the result of
the system-call. A value in the range between -4095 and -1 indicates
an error, it is -errno.
Only values of class INTEGER or class MEMORY are passed to the kernel.

Remember this is from the Linux-specific appendix to the ABI, and even for Linux it's informative not normative. (But it is in fact accurate.)

This 32-bit int $0x80 ABI is usable in 64-bit code (but highly not recommended). What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? It still truncates its inputs to 32-bit, so it's unsuitable for pointers, and it zeros r8-r11.

User Interface: function calling

x86-32 Function Calling convention:

In x86-32 parameters were passed on stack. Last parameter was pushed first on to the stack until all parameters are done and then call instruction was executed. This is used for calling C library (libc) functions on Linux from assembly.

Modern versions of the i386 System V ABI (used on Linux) require 16-byte alignment of %esp before a call, like the x86-64 System V ABI has always required. Callees are allowed to assume that and use SSE 16-byte loads/stores that fault on unaligned. But historically, Linux only required 4-byte stack alignment, so it took extra work to reserve naturally-aligned space even for an 8-byte double or something.

Some other modern 32-bit systems still don't require more than 4 byte stack alignment.

x86-64 System V user-space Function Calling convention:

x86-64 System V passes args in registers, which is more efficient than i386 System V's stack args convention. It avoids the latency and extra instructions of storing args to memory (cache) and then loading them back again in the callee. This works well because there are more registers available, and is better for modern high-performance CPUs where latency and out-of-order execution matter. (The i386 ABI is very old).

In this new mechanism: First the parameters are divided into classes. The class of each parameter determines the manner in which it is passed to the called function.

For complete information refer to : "3.2 Function Calling Sequence" of System V Application Binary Interface AMD64 Architecture Processor Supplement which reads, in part:

Once arguments are classified, the registers get assigned (in
left-to-right order) for passing as follows:
If the class is MEMORY, pass the argument on the stack.
If the class is INTEGER, the next available register of the
sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used

So %rdi, %rsi, %rdx, %rcx, %r8 and %r9 are the registers in order used to pass integer/pointer (i.e. INTEGER class) parameters to any libc function from assembly. %rdi is used for the first INTEGER parameter. %rsi for 2nd, %rdx for 3rd and so on. Then call instruction should be given. The stack (%rsp) must be 16B-aligned when call executes.

If there are more than 6 INTEGER parameters, the 7th INTEGER parameter and later are passed on the stack. (Caller pops, same as x86-32.)

The first 8 floating point args are passed in %xmm0-7, later on the stack. There are no call-preserved vector registers. (A function with a mix of FP and integer arguments can have more than 8 total register arguments.)

Variadic functions (like printf) always need %al = the number of FP register args.

There are rules for when to pack structs into registers (rdx:rax on return) vs. in memory. See the ABI for details, and check compiler output to make sure your code agrees with compilers about how something should be passed/returned.

Note that the Windows x64 function calling convention has multiple significant differences from x86-64 System V, like shadow space that must be reserved by the caller (instead of a red-zone), and call-preserved xmm6-xmm15. And very different rules for which arg goes in which register.

Win64 and Linux-x86_64 Calling Convention Unused registers modified or not

A registers's status as call-preserved or call-clobbered never depends on the number of args actually passed by the caller and/or expected by the callee, in any calling convention for any ISA I've looked at, and certainly not any of the standard ones on x86.

But yes the calling conventions for raw system calls are different from those for functions, even for presumably thin wrapper functions.

All standard user-space function calling conventions have all the arg-passing registers (and stack slots) as call-clobbered. So if your asm uses call, that's what you need to expect.

The system-calling conventions on mainstream OSes preserves all registers (except the return value). (But on x86-64, only after syscall itself overwrites RCX and R11, because that happens before the kernel gets control.) If you directly use syscall or int 0x80 or whatever, that's what you should expect.

Note that Windows does not have a stable system-call ABI across kernel versions and doesn't document the raw system calls, so in normal Windows code you're always making DLL function calls, never raw system calls. People have reverse-engineered the system calls for different Windows versions, though.

MacOS also doesn't officially have a stable/documented syscall ABI, but in practice Darwin basically does, at least for the normal POSIX open/read/write/close/exit calls that toy programs use.

What registers are preserved through a linux x86-64 function call
What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
https://packagecloud.io/blog/the-definitive-guide-to-linux-system-calls/
Where is the x86-64 System V ABI documented?
Windows system calls

What Registers Are Preserved Through a Linux X86-64 Function Call