Technically, How Do Variadic Functions Work? How Does Printf Work

Technically, how do variadic functions work? How does printf work?

The C and C++ standard do not have any requirement on how it has to work. A complying compiler may well decide to emit chained lists, std::stack<boost::any> or even magical pony dust (as per @Xeo's comment) under the hood.

However, it is usually implemented as follows, even though transformations like inlining or passing arguments in the CPU registers may not leave anything of the discussed code.

Please also note that this answer specifically describes a downwards growing stack in the visuals below; also, this answer is a simplification just to demonstrate the scheme (please see https://en.wikipedia.org/wiki/Stack_frame).

How can a function be called with a non-fixed number of arguments

This is possible because the underlying machine architecture has a so-called "stack" for every thread. The stack is used to pass arguments to functions. For example, when you have:

foobar("%d%d%d", 3,2,1);

Then this compiles to an assembler code like this (exemplary and schematically, actual code might look different); note that the arguments are passed from right to left:

push 1
push 2
push 3
push "%d%d%d"
call foobar

Those push-operations fill up the stack:

              []   // empty stack
-------------------------------
push 1: [1]
-------------------------------
push 2: [1]
[2]
-------------------------------
push 3: [1]
[2]
[3] // there is now 1, 2, 3 in the stack
-------------------------------
push "%d%d%d":[1]
[2]
[3]
["%d%d%d"]
-------------------------------
call foobar ... // foobar uses the same stack!

The bottom stack element is called the "Top of Stack", often abbreviated "TOS".

The foobar function would now access the stack, beginning at the TOS, i.e. the format string, which as you remember was pushed last. Imagine stack is your stack pointer , stack[0] is the value at the TOS, stack[1] is one above the TOS, and so forth:

format_string <- stack[0]

... and then parses the format-string. While parsing, it recognozies the %d-tokens, and for each, loads one more value from the stack:

format_string <- stack[0]
offset <- 1
while (parsing):
token = tokenize_one_more(format_string)
if (needs_integer (token)):
value <- stack[offset]
offset = offset + 1
...

This is of course a very incomplete pseudo-code that demonstrates how the function has to rely on the arguments passed to find out how much it has to load and remove from the stack.

Security

This reliance on user-provided arguments is also one of the biggest security issues present (see https://cwe.mitre.org/top25/). Users may easily use a variadic function wrongly, either because they did not read the documentation, or forgot to adjust the format string or argument list, or because they are plain evil, or whatever. See also Format String Attack.

C Implementation

In C and C++, variadic functions are used together with the va_list interface. While the pushing onto the stack is intrinsic to those languages (in K+R C you could even forward-declare a function without stating its arguments, but still call it with any number and kind arguments), reading from such an unknown argument list is interfaced through the va_...-macros and va_list-type, which basically abstracts the low-level stack-frame access.

Why do my variadic functions in my printf function not work?

If I do the following: ft_printf("%c%c", 'a', 'b');

it will print aa, instead of ab.

If I do the following: ft_printf("%c%d", 't', 29);

it will not print t29 like it's supposed to. Instead, it will print t116 as it does detect that I would like to print an int, but doesn't use the right argument (it converts "t" in its ascii value (116)).

visibly you do not progress and always use the first argument, this is because you give the va_list by value, so you use a copy of it and you cannot progress in the list of argument. Just give it by pointer

in ft_printf

ft_parser(&argptr, (char *)str, &box);

and :

static void ft_craft1(va_list *argptr, t_list *box)
{
if (box->type == 'c')
ft_c_craft(va_arg(*argptr, int), box);
else if (box->type == 's')
ft_s_craft(va_arg(*argptr, char *), box);
else if (box->type == 'd' || box->type == 'i')
ft_di_craft(va_arg(*argptr, int), box);
}

etc

How can the compiler able to warn if I pass more arguments to printf function?

The used compiler is smart enough(*) to recognize printf() as a special function, in this case a function of the standard library. This function receives a format string. If the compiler can read this format string, it interprets the format codes like printf() will do. And so it expects the number and types of the following arguments.

You can use this feature for your own printf-like function, for example (shamelessly copied from GCC's manual):

extern int
my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));

(*) "Smart enough" means that printf() is declared in "stdio.h" with this attribute, and the compiler knows how to process this attribute.

How does printf( %d ,x) work/interpreted?

How is printf("%d",x) is interpreted?

OP's explanation is one potential partial description of code.

Yet C and C++ are typically compiled, not interpreted. Compilers today can examine printf("%d",x) and emit code like print_an_integer(x);.


With OP's example, the 2 codes are functionally the same, yet a compiler may not recognize optimization potential as described above with the first code.

int x=5;
// In C++, should be `const char *p="%d";`
char *p="%d";
printf(p,x);
// same as ????
printf("%d",x);

Instead the format string passed to printf() is processed, looking for characters to print and print specifiers. The print specifiers, in turn, get the next argument and process them accordingly.

If the format string and the arguments do not match, like printf("%f",x) or p="%d"; printf(p, 5.0), the result is undefined behavior.



... how exactly ...

A compiler is allowed wide latitude on forming the emitted code. It needs to meet the equivalent functional requirements of a virtual machine - not OP's explanation. Any exact explanation is compiler and code dependent.

what does ... do as a function argument in C?

This is how ISO C defines a syntax for declaring a function to take a variable number or type of arguments.
Using ... notation with stdarg.h you can implement variadic funtion: functions that accept an unlimited number of argument. Its usage is explained in Gnu library manual.

You can go through the parameters using va_args from stdarg.h. Here is good tutorial to start with.

You can also implement variadic macros SOME_MACRO(...). For that you can refer to this thread.

Why does my variadic function work with both int and long long?

That has nothing to do with C. It is just that the system you used (x86-64) passes the first few arguments in 64-bit registers, even for variadic arguments.

Essentially, on the architecture you used, the compiler produces code that uses a full 64-bit register for each argument, including variadic arguments. This is the ABI agreed upon the architecture, and has nothing to do with C per se; all programs, no matter how produced, are supposed to follow the ABI on the architecture it is supposed to run.

If you use Windows, x86-64 uses rcx, rdx, r8, and r9 for the four first (integer or pointer) arguments, in that order, and stack for the rest. In Linux, BSD's, Mac OS X, and Solaris, x86-64 uses rdi, rsi, rdx, rcx, r8, and r9 for the first six (integer or pointer) arguments, in that order, and stack for the rest.

You can verify this with a trivial example program:

extern void func(int n, ...);

void test_int(void)
{
func(0, 1, 2);
}

void test_long_long(void)
{
func(0, 1LL, 2LL);
}

If you compile the above to x86-64 assembly (e.g. gcc -Wall -O2 -march=x86-64 -mtune=generic -S) in Linux, BSDs, Solaris, or Mac OS (X or later), you get approximately (AT&T syntax, source,target operand order)

test_int:
movl $2, %edx
movl $1, %esi
xorl %edi, %edi
xorl %eax, %eax
jmp func

test_long_long:
movl $2, %edx
movl $1, %esi
xorl %edi, %edi
xorl %eax, %eax
jmp func

i.e. the functions are identical, and do not push the arguments to the stack. Note that jmp func is equivalent to call func; ret, just simpler.

However, if you compile for x86 (-m32 -march=i686 -mtune=generic), you get approximately

test_int:
subl $16, %esp
pushl $2
pushl $1
pushl $0
call func
addl $28, %esp
ret

test_long_long:
subl $24, %esp
pushl $0
pushl $2
pushl $0
pushl $1
pushl $0
call func
addl $44, %esp
ret

which shows that the x86 calling conventions in Linux/BSDs/etc. involve passing the variadic arguments on stack, and that the int variant pushes 32-bit constants to the stack (pushl $x pushes a 32-bit constant x to the stack), and the long long variant pushes 64-bit constants to the stack.

Therefore, because of the underlying ABI of the operating system and architecture you use, your variadic function shows the "anomaly" you observed. To see the behaviour you expect from the C standard alone, you need to work around the underlying ABI quirk -- for example, by starting your variadic functions with at least six arguments, to occupy the registers on x86-64 architectures, so that the rest, your truly variadic arguments, are passed on the stack.

How does printf() works without argument count being explicitly mentioned

Let's look on printf declaration structure:

int printf(const char *format, ...)

format is actually the string that contains the text to be written to stdout.

The contained embedded format tags are later replaced by the values specified in subsequent additional arguments, and format is set accordingly as required.

How is vprintf implemented?

If you want to write a function which takes a va_list as an argument, the way vprintf does, then you just do that. You can extract arguments from the va_list with va_arg in the normal way.

Don't call va_start or va_end on the va_list: that's the responsibility of the caller. Since you can't restart the va_list in the normal way, if you need to scan it more than once, you'll need to va_copy it.

Here's a quick example, just for illustration (i.e. it's not meant to be the best possible implementation).

These two functions just join a bunch of strings using a provided delimiter string. The first one is the "v" version (like vsprintf), which implements the logic. The second one is the varargs version which packages up the va_list and passes it to the implementation.

The inner function runs through the arguments twice; the first time it adds the sizes of the strings. Both functions return a newly-malloc'd string which will need to be free'd by the caller.

The argument list must be terminated with a NULL.

char* vjoin(const char* delim, va_list ap) {
va_list aq;
va_copy(aq, ap);
size_t dlen = strlen(delim);

/* First pass. Use the copied va_list */
size_t needed = 1; /* NUL terminator */
const char* s = va_arg(aq, const char*);
if (s) {
needed += strlen(s);
while ((s = va_arg(aq, const char*)))
needed += dlen + strlen(s);
}
va_end(aq);

/* Second pass. Use the original va_list */
char* rv = malloc(needed);
size_t offset = 0;
*rv = 0;
s = va_arg(ap, const char*);
if (s) {
strcpy(rv, s);
offset = strlen(s);
while ((s = va_arg(ap, const char*))) {
strcpy(rv + offset, delim);
strcpy(rv + offset + dlen, s);
offset += dlen + strlen(s);
}
}
return rv;
}

char* join(const char* delim, ...) {
va_list ap;
va_start(ap, delim);
char* rv = vjoin(delim, ap);
va_end(ap);
return rv;
}

In C, why is %s working without giving it a value?

This is undefined behavior, anything can happen included something that looks like correct. But it is incorrect.
Your compiler can probably tell you the problem if you use correct options.

Standard says (emphasized is mine):

7.21.6.1 The fprintf function


  1. The fprintf function writes output to the stream pointed to by stream,
    under control of the string pointed to by format that specifies how
    subsequent arguments are converted for output. If there are
    insufficient arguments for the format, the behavior is undefined.
    If
    the format is exhausted while arguments remain, the excess arguments
    are evaluated (as always) but are otherwise ignored. The fprintf
    function returns when the end of the format string is encountered.

Are functions with variadic arguments required to call va_start?

Yes, this is legal. No, functions are not required to call va_start. From the C99 standard:

If access to the varying arguments is desired, the called function
shall declare an object ... having type va_list.

Notice two things here:

  1. A va_list is a prerequisite to the va_start call.
  2. A va_list is only necessary to have if access to the varying arguments is desired.

As such, also the va_start call is only necessary if access to the varying arguments is desired.



Related Topics



Leave a reply



Submit