Why Are Arguments Which Do Not Match the Conversion Specifier in Printf Undefined Behavior

Why are arguments which do not match the conversion specifier in printf undefined behavior?

Some compilers may implement variable-format arguments in a way that allows the
types of arguments to be validated; since having a program trap on incorrect
usage may be better than possibly having it output seemingly-valid-but-wrong
information, some platforms may choose to do that.

Because the behavior of traps is outside the realm of the C Standard, any action
which might plausibly trap is classified as invoking Undefined Behavior.

Note that the possibility of implementations trapping based on incorrect formatting means that behavior is considered undefined even in cases where the expected type and the actual passed type have the same representation, except that signed and unsigned numbers of the same rank are interchangeable if the values they hold are within the range which is common to both [i.e. if a "long" holds 23, it may be output with "%lX" but not with "%X" even if "int" and "long" are the same size].

Note also that the C89 committee introduced a rule by fiat, which remains to this day, which states that even if "int" and "long" have the same format, the code:

long foo=23;
int *u = &foo;
(*u)++;

invokes Undefined Behavior since it causes information which was written as type "long" to be read as type "int" (behavior would also be Undefined if it was type "unsigned int"). Since a "%X" format specifier would cause data to be read as type "unsigned int", passing the data as type "long" would almost certainly cause the data to be stored somewhere as "long" but subsequently read as type "unsigned int", such behavior would almost likely violate the aforementioned rule.

Why is printf with a single argument (without conversion specifiers) deprecated?

printf("Hello World!"); is IMHO not vulnerable but consider this:

const char *str;
...
printf(str);

If str happens to point to a string containing %s format specifiers, your program will exhibit undefined behaviour (mostly a crash), whereas puts(str) will just display the string as is.

Example:

printf("%s");   //undefined behaviour (mostly crash)
puts("%s"); // displays "%s\n"

Passing too few arguments to printf for the format specifier — is it undefined behavior?

The n$ notation is not part of standard C, but is part of POSIX. The POSIX specification for printf() supports the n$ notation to refer to arguments.

Conversions can be applied to the nth argument after the format in the argument list, rather than to the next unused argument. In this case, the conversion specifier character % (see below) is replaced by the sequence "%n$", where n is a decimal integer in the range [1,{NL_ARGMAX}], giving the position of the argument in the argument list. This feature provides for the definition of format strings that select arguments in an order appropriate to specific languages (see the EXAMPLES section).

The format can contain either numbered argument conversion specifications (that is, "%n$" and "*m$"), or unnumbered argument conversion specifications (that is, % and *), but not both. The only exception to this is that %% can be mixed with the "%n$" form. The results of mixing numbered and unnumbered argument specifications in a format string are undefined. When numbered argument specifications are used, specifying the Nth argument requires that all the leading arguments, from the first to the (N-1)th, are specified in the format string.

In format strings containing the "%n$" form of conversion specification, numbered arguments in the argument list can be referenced from the format string as many times as required.

It requires that you provide an argument for each n$, and that the format string refers to every argument 1..n. It doesn't say you have to use a different n$ each time.

The code shown is fine on POSIX systems. Since it uses a POSIX-only feature, it won't be portable to non-POSIX systems that don't have the necessary support as an extension.

Why does printf(%f,0); give undefined behavior?

The "%f" format requires an argument of type double. You're giving it an argument of type int. That's why the behavior is undefined.

The standard does not guarantee that all-bits-zero is a valid representation of 0.0 (though it often is), or of any double value, or that int and double are the same size (remember it's double, not float), or, even if they are the same size, that they're passed as arguments to a variadic function in the same way.

It might happen to "work" on your system. That's the worst possible symptom of undefined behavior, because it makes it difficult to diagnose the error.

N1570 7.21.6.1 paragraph 9:

... If any argument is not the correct type for the corresponding
conversion specification, the behavior is undefined.

Arguments of type float are promoted to double, which is why printf("%f\n",0.0f) works. Arguments of integer types narrower than int are promoted to int or to unsigned int. These promotion rules (specified by N1570 6.5.2.2 paragraph 6) do not help in the case of printf("%f\n", 0).

Note that if you pass a constant 0 to a non-variadic function that expects a double argument, the behavior is well defined, assuming the function's prototype is visible. For example, sqrt(0) (after #include <math.h>) implicitly converts the argument 0 from int to double -- because the compiler can see from the declaration of sqrt that it expects a double argument. It has no such information for printf. Variadic functions like printf are special, and require more care in writing calls to them.

printf() giving identical output on x86-64 platforms even when arguments are swapped

The answer is because it's how the System V ABI x86-64 defines how arguments should be passed.

According to PDF page 22, the first 6 integer arguments are passed on %rdi, %rsi, %rdx, %rcx, %r8, %r9, and the first 8 floating-point arguments are passed from %xmm0 to %xmm7. However, there's no specific order between integers and floats. Therefore, the following two functions, despite being defined differently, behave the same.

int f1(int i1, int i2, int i3, double d1, double d2, double d3);
int f2(double d1, double d2, int i1, int i2, double d3, int i3);

As compiled following the Syetem V x86-64 ABI, both functions will receive i1, i2 and i3 in registers %rdi, %rsi and %rdx, and d1, d2 and d3 in registers %xmm0, %xmm1, %xmm2.

Variadic arguments are no exception. Up to 6 integers and up to 8 floats are passed via registers, and the rest are passed on stack.

Talking about this specific code, by inspecting the assembly code generated by gcc -O0 -S, I verified the above statements: The integer 5678 is sent to printf via %rsi, and the (double-precision) floating-point value 1234.0 is sent to printf via %xmm0. In both cases, %eax is set to 1, indicating there's one floating-point argument available.

Oh yeah where's %rdi? Actually, the formatting string is the first argument, so a pointer to the string is passed via %rdi.

printf doesn't know if the integer is before the float or the other way, it only knows it has one integer argument (after the formatting string) and one floating-point argument (reading %al). This is exactly why the two lines produce identical output.

TODO: someone put a Godbolt link here?

How printf works in case of type mismatch with type specifier?

You could see that the code also works with %d, %x, %u format specifiers.

Why it works without any warnings ?

Because you don't have warnings enabled in your CodeBlocks.

Go to settings -> compiler and check

Enable All Common Compiler Warnings [-Wall]

And now you get:

In function 'int main()':
warning: format '%c' expects argument of type 'int', but argument 2 has type 'const char*' [-Wformat=]|

Why it even works ?

With %c, $ is the output in CodeBlocks, X is the output in Visual Studio . So, that sounds like undefined behavior.

Wrong format specifiers in scanf (or) printf

Anyways if you want the first char this way only you could do this:

#include <stdio.h>

int main()
{
printf("%c", *"Hello\n"); // Not asked in Question but still :)

return 0;
}

It prints H by dereferencing the const pointer.

why printf doesn't accept a pointer as an argument

why codeblocks crashes when I try to run this code :

char *ch= "Sam smith";
printf("%s\n",*ch);

ch is of type char *.

The %s conversion specifier expects a char *.

You pass *ch, which is the dereferenced ch, i.e. of type char.

If conversion specifiers do not match the types of the arguments, bad things (undefined behavior) happens.

shouldn't *ch mean the content of the address pointed to by ch which is the string itself

There is no data type "string" in C, and thus no "pointer to string".

A "string", in C parlance, is an array of characters, or a pointer to an array of characters, with a null-byte terminator.

ch is a char *, a pointer to the first character of that array - a string, so to speak.

*ch is a char, the first character of that array.

Is calling printf with excess arguments undefined behaviour?

Yes, this scenario is explicitly defined by the standard. It is not undefined behaviour.

To quote the C11 standard, chapter §7.21.6.1, The fprintf() function

[...] If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored [...]

What will printf do when not enough arguments are passed?

The C spec is explicit on this point:

... If there are insufficient arguments for the format, the behavior is undefined. ...

C11dr §7.21.6.1 2

Are there any guarantees on what the result will be? --> No.

(On my machine, nothing gets printed at all.) Is this always the case --> No.

Is there a potential for it to print the string with an resolved specifier? --> Yes. The behavior is undefined. anything may happen.



Related Topics



Leave a reply



Submit