C Calling Conventions and Passed Arguments

C calling conventions and passed arguments

Although the caller (in some calling conventions) is the one that cleans up the arguments, all it's really doing is deallocating the space previously allocated on the stack to hold the argument values. The callee is free to modify the values during execution of the function, because the caller isn't going to look at their values later.

In the example you posted, GCC has emitted the popl %eax instruction to deallocate the space taken by the parameter on the stack. All it really needs to do is add 4 to %esp (the stack on x86 grows downwards in memory), and executing the popl %eax instruction is the shortest and fastest way to do this. If the compiler needed to deallocate 20 values, it would probably modify %esp directly instead of emitting 20 popl instructions.

You will probably find that the new value of %eax is not used in the following code.

x86 calling convention: should arguments passed by stack be read-only?

Actually, I just compiled this function using GCC:

int foo(int x)
{
goo(&x);
return x;
}

And it generated this code:

_foo:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
leal 8(%ebp), %eax
movl %eax, (%esp)
call _goo
movl 8(%ebp), %eax
leave
ret

This is using GCC 4.9.2 (on 32-bit cygwin if it matters), no optimizations. So in fact, GCC did exactly what you thought it should do and used the argument directly from where the caller pushed it on the stack.

x64 argument and return value calling convention

I'm afraid you are mistaken:

  • the function argument is passed in rdi, as per the x86-64 System V calling convention.
  • register rbx must not be modified by a function; GCC saves/restores it as required, so it can keep a copy of x there across the call to bar.
  • the function return value is in rax. (Actually eax; a 32-bit int only uses the low half)

You can verify the basics by compiling a function like int foo(int x){return x;} - you'll see just a mov eax, edi.

Here is a commented version:

foo:                                    # @foo
push rbx # save register rbx
mov ebx, edi # save argument `x` in ebx
call bar # a = bar() (in eax)
or eax, ebx # compute `x | a`, setting FLAGS
mov ecx, 456 # prepare 456 for conditional move
mov eax, 123 # eax = 123
cmove eax, ecx # if `(x | a) == 0` set eax to 456
pop rbx # restore register rbx
ret # return value is in eax

The compiler optimizes x || b as (x | b) != 0 which allows for branchless code generation.

Note that mov doesn't modify the FLAGS, unlike most integer ALU instructions.

C calling convention: who cleans the stack in variadic functions vs normal functions?

as far as I am concerned, C does use cdecl

Its name notwithstanding, the cdecl convention is not universal for C code, not even on the x86 architecture. It has the advantage of being simple to define and implement, but it makes no use of CPU registers for argument passing, which is more efficient. That makes a difference even on register-starved x86, but it makes a lot more difference on architectures with more available registers, such as x86_64.

Talking about the cleanup, here is my question. I do not understand:
are there three different things?

  1. stack clean
  2. moving the pointer back to the penultimate stack frame
  3. stack restoration

Or how should I see them?

I would be inclined to interpret (1) and (3) as different ways of saying the same thing, but it is conceivable that someone would draw distinctions between them. (3) and related wording is what I encounter most frequently. (2) is not necessarily the same thing, because there may be two relevant stack parameters to be restored: the base of the stack frame (see below), and the top of the stack. The stack frame base is important in the event that the stack frame contains more information than argument and local variable values, such as the base of the previous stack frame.

Also, the target of this question is basically how could variadic
function works in calling conventions like Pascal or stdcall where the
callee should clear / clean / restore (I don't know which operation)
the stack - but he doesn't know how many parameters it will receive.

The stack is not necessarily the whole picture.

The callee cannot restore the stack if it does not know how to find the top of its caller's stack, and, if necessary, the base of its caller's stack frame. But in practice, this is usually hardware assisted.

Taking x86 (for which cdecl was designed) as an example, the CPU has registers for both the stack (frame) base and the current stack pointer. The caller's stack base is stored on the stack at a known offset (0) from the callee's stack base. Regardless of the number of arguments, the callee restores the stack by moving the top of the stack to its own stack base, and popping the value there to obtain the caller's stack base.

It is conceivable, however, that there is a calling convention in use somewhere that affords no way to restore the stack to a chosen previous state other than to pop elements one at a time, that does not explicitly convey the number of arguments to the called function, and that requires the callee to restore the caller's stack. Such a calling convention would not support variadic functions.

Why is it so important the order in which parameters are pushed on to the stack?

The order is not important in any general sense, but it is essential for caller and callee, which may be compiled separately, to agree about it. Otherwise, the callee cannot match the passed values with the parameters they are intended for. Thus, to whatever extent a calling convention relies on the stack, it must specify precisely which arguments are passed there, and in which order.

Regarding stack frames: this is more material that is not specified by C and that varies, at least to some extent. Conceptually, though, the stack frame of a function call is the portion of the stack that provides execution context for that call. It typically supplies storage for local variables, and it may contain additional information, such as a return address and / or the value of the caller's stack frame pointer. It might also contain other per-function-call information appropriate for the execution environment. Details are part of the calling convention in use.

Why does the x86-64 System V calling convention pass args in registers instead of just the stack?

instead of put the first 6 arguments in registers just to move them onto the stack in the function prologue?

I was looking at some code that gcc generated and that's what it always did.

Then you forgot to enable optimization. gcc -O0 spills everything to memory so you can modify them with a debugger while single-stepping. That's obviously horrible for performance, so compilers don't do that unless you force them to by compiling with -O0.

x86-64 System V allows int add(int x, int y) { return x+y; } to compile to

lea eax, [rdi + rsi] / ret, which is what compilers actually do as you can see on the Godbolt compiler explorer.

Stack-args calling conventions are slow and obsolete. RISC machines have been using register-args calling conventions since before x86-64 existed, and on OSes that still care about 32-bit x86 (i.e. Windows), there are better calling conventions like __vectorcall that pass the first 2 integer args in registers.

i386 System V hasn't been replaced because people mostly don't care as much about 32-bit performance on other OSes; we just use 64-bit code with the nicely-designed x86-64 System V calling convention.

For more about the tradeoff between register args and call-preserved vs. call-clobbered registers in calling convention design, see Why not store function parameters in XMM vector registers?, and also Why does Windows64 use a different calling convention from all other OSes on x86-64?.

Why can't stdcall handle varying amounts of arguments?

Couldn't stdcall functions also get a parameter of how many variables are there and do the same?

If the caller has to pass a separate arg with the number of bytes to be popped, that's more work than just doing add esp, 16 or whatever after the call (cdecl style caller-pops). It would totally defeat the purpose of stdcall, which is to save a few bytes of space at each call site, especially for naive code-gen that wouldn't defer popping args across a couple calls, or reuse the space allocated by a push with mov stores. (There are often multiple call-sites for each function, so the extra 2 bytes for ret imm16 vs. ret is amortized over that.)

Even worse, the callee can't use a variable number efficiently on x86 / x86-64. ret imm16 only works with an immediate (constant embedded in the machine code), so to pop a variable number of bytes above the return address, a function would have to copy the return address high up in the stack and do a plain ret from there. (Or defeat branch return-address branch prediction by popping the return address into a register.)

See also:

  • Stack cleanup in stdcall (callee-pops) for variable arguments (x86 asm)
  • What calling convention does printf() in C use? (why stdcall is unusable)


How do cdecl functions know how many arguments they've received?

They don't.

C is designed around the assumption that variadic functions don't know how many args they received, so functions need something like a format string or sentinel to know how many to iterate. For example, the POSIX execl(3) (wrapper for the execve(2) system call) takes a NULL-terminated list of char* args.

Thus calling conventions in general don't waste code-size and cycles on providing a count as a side-channel; whatever info the function needs will be part of the real C-level args.

Fun fact: printf("%d", 1, 2, 3) is well-defined behaviour in C, and is required to safely ignore args beyond the ones referenced by the format string.

So using stdcall and calculating based on the format-string can't work. You're right, if you wanted to make a callee-pops convention that worked for variadic functions, you would need to pass a size somewhere, e.g. in a register. But like I said earlier, the caller knows the right number, so it would be vastly easier to let the caller manage the stack, instead of making the callee dig up this extra arg later. That's why no real-world calling conventions work this way, AFAIK.



Related Topics



Leave a reply



Submit