Implicit type promotion rules
C was designed to implicitly and silently change the integer types of the operands used in expressions. There exist several cases where the language forces the compiler to either change the operands to a larger type, or to change their signedness.
The rationale behind this is to prevent accidental overflows during arithmetic, but also to allow operands with different signedness to co-exist in the same expression.
Unfortunately, the rules for implicit type promotion cause much more harm than good, to the point where they might be one of the biggest flaws in the C language. These rules are often not even known by the average C programmer and therefore cause all manner of very subtle bugs.
Typically you see scenarios where the programmer says "just cast to type x and it works" - but they don't know why. Or such bugs manifest themselves as rare, intermittent phenomena striking from within seemingly simple and straight-forward code. Implicit promotion is particularly troublesome in code doing bit manipulations, since most bit-wise operators in C come with poorly-defined behavior when given a signed operand.
Integer types and conversion rank
The integer types in C are char
, short
, int
, long
, long long
and enum
._Bool
/bool
is also treated as an integer type when it comes to type promotions.
All integers have a specified conversion rank. C11 6.3.1.1, emphasis mine on the most important parts:
Every integer type has an integer conversion rank defined as follows:
— No two signed integer types shall have the same rank, even if they have the same representation.
— The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision.
— The rank oflong long int
shall be greater than the rank oflong int
, which shall be greater than the rank ofint
, which shall be greater than the rank ofshort int
, which shall be greater than the rank ofsigned char
.
— The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any.
— The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width.
— The rank of char shall equal the rank of signed char and unsigned char.
— The rank of _Bool shall be less than the rank of all other standard integer types.
— The rank of any enumerated type shall equal the rank of the compatible integer type (see 6.7.2.2).
The types from stdint.h
sort in here too, with the same rank as whatever type they happen to correspond to on the given system. For example, int32_t
has the same rank as int
on a 32 bit system.
Further, C11 6.3.1.1 specifies which types are regarded as the small integer types (not a formal term):
The following may be used in an expression wherever an
int
orunsigned int
may
be used:
— An object or expression with an integer type (other than
int
orunsigned int
) whose integer conversion rank is less than or equal to the rank ofint
andunsigned int
.
What this somewhat cryptic text means in practice, is that _Bool
, char
and short
(and also int8_t
, uint8_t
etc) are the "small integer types". These are treated in special ways and subject to implicit promotion, as explained below.
The integer promotions
Whenever a small integer type is used in an expression, it is implicitly converted to int
which is always signed. This is known as the integer promotions or the integer promotion rule.
Formally, the rule says (C11 6.3.1.1):
If an
int
can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to anint
; otherwise, it is converted to anunsigned int
. These are called the integer promotions.
This means that all small integer types, no matter signedness, get implicitly converted to (signed) int
when used in most expressions.
This text is often misunderstood as: "all small signed integer types are converted to signed int and all small, unsigned integer types are converted to unsigned int". This is incorrect. The unsigned part here only means that if we have for example an unsigned short
operand, and int
happens to have the same size as short
on the given system, then the unsigned short
operand is converted to unsigned int
. As in, nothing of note really happens. But in case short
is a smaller type than int
, it is always converted to (signed) int
, regardless of it the short was signed or unsigned!
The harsh reality caused by the integer promotions means that almost no operation in C can be carried out on small types like char
or short
. Operations are always carried out on int
or larger types.
This might sound like nonsense, but luckily the compiler is allowed to optimize the code. For example, an expression containing two unsigned char
operands would get the operands promoted to int
and the operation carried out as int
. But the compiler is allowed to optimize the expression to actually get carried out as an 8-bit operation, as would be expected. However, here comes the problem: the compiler is not allowed to optimize out the implicit change of signedness caused by the integer promotion because there is no way for the compiler to tell if the programmer is purposely relying on implicit promotion to happen, or if it is unintentional.
This is why example 1 in the question fails. Both unsigned char operands are promoted to type int
, the operation is carried out on type int
, and the result of x - y
is of type int
. Meaning that we get -1
instead of 255
which might have been expected. The compiler may generate machine code that executes the code with 8 bit instructions instead of int
, but it may not optimize out the change of signedness. Meaning that we end up with a negative result, that in turn results in a weird number when printf("%u
is invoked. Example 1 could be fixed by casting the result of the operation back to type unsigned char
.
With the exception of a few special cases like ++
and sizeof
operators, the integer promotions apply to almost all operations in C, no matter if unary, binary (or ternary) operators are used.
The usual arithmetic conversions
Whenever a binary operation (an operation with 2 operands) is done in C, both operands of the operator have to be of the same type. Therefore, in case the operands are of different types, C enforces an implicit conversion of one operand to the type of the other operand. The rules for how this is done are named the usual artihmetic conversions (sometimes informally referred to as "balancing"). These are specified in C11 6.3.18:
(Think of this rule as a long, nested if-else if
statement and it might be easier to read :) )
6.3.1.8 Usual arithmetic conversions
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the usual arithmetic conversions:
- First, if the corresponding real type of either operand is
long double
, the other operand is converted, without change of type domain, to a type whose corresponding real type islong double
.
- Otherwise, if the corresponding real type of either operand is
double
, the other operand is converted, without change of type domain, to a type whose corresponding real type isdouble
. - Otherwise, if the corresponding real type of either operand is
float
, the other operand is converted, without change of type domain, to a type whose corresponding real type is float. - Otherwise, the integer promotions are performed on both operands. Then the
following rules are applied to the promoted operands:
- If both operands have the same type, then no further conversion is needed.
- Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank. - Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type. - Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type. - Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
Notable here is that the usual arithmetic conversions apply to both floating point and integer variables. In the case of integers, we can also note that the integer promotions are invoked from within the usual arithmetic conversions. And after that, when both operands have at least the rank of int
, the operators are balanced to the same type, with the same signedness.
This is the reason why a + b
in example 2 gives a strange result. Both operands are integers and they are at least of rank int
, so the integer promotions do not apply. The operands are not of the same type - a
is unsigned int
and b
is signed int
. Therefore the operator b
is temporarily converted to type unsigned int
. During this conversion, it loses the sign information and ends up as a large value.
The reason why changing type to short
in example 3 fixes the problem, is because short
is a small integer type. Meaning that both operands are integer promoted to type int
which is signed. After integer promotion, both operands have the same type (int
), no further conversion is needed. And then the operation can be carried out on a signed type as expected.
Implicit integer type conversion in C
In C99, the reference is 6.3.1.8 "Usual arithmetic conversions".
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the usual arithmetic conversions:
- First, if the corresponding real type of either operand is
long double
, the other
operand is converted, without change of type domain, to a type whose
corresponding real type islong double
.- Otherwise, if the corresponding real type of either operand is
double
, the other
operand is converted, without change of type domain, to a type whose
corresponding real type isdouble
.- Otherwise, if the corresponding real type of either operand is
float
, the other
operand is converted, without change of type domain, to a type whose
corresponding real type isfloat
. 51)- Otherwise, the integer promotions are performed on both operands. Then the
following rules are applied to the promoted operands:
- If both operands have the same type, then no further conversion is needed.
- Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank.- Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.- Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.- Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
Addition performs the usual arithmetic conversions, so, when adding unsigned char
and signed int
, either:
- first the
unsigned char
is promoted toint
, and then both types are the same, so the result has typeint
, or - (uncommon)
int
cannot represent all possibleunsigned char
values. In this case,unsigned char
is promoted tounsigned int
, and the third sub-bullet applies:unsigned int
has equal rank toint
, so theint
operand is converted tounsigned int
, and the result has typeunsigned int
.
C integer implicit conversion
Since the constant 0xffffffff
, which (assuming int
is 32 bits) has type unsigned int
, is being used to initialize an object of type int
, this involves a conversion from unsigned int
to int
.
Conversion between integer types is described in section 6.3.1.3 of the C standard:
1 When a value with integer type is converted to another integer type other than
_Bool
, if the value can be represented by the new
type, it is unchanged.2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined
or an implementation-defined signal is raised
Paragraph 3 is what applies in this case. The value in question is outside the range of the destination type and the destination is signed. So an implementation-defined conversion happens.
If you compile with gcc using the -Wconversion
flag, it will give you a warning:
x1.c:6:5: warning: conversion of unsigned constant value to negative integer [-Wsign-conversion]
int a = 0xffffffff;
Also:
This can be easily checked by doing
printf("%s", 0xffffffff);
This invokes undefined behavior because the %s
format specifier expects a char *
which points to a null-terminated string. The value you're passing is not of this type, and likely isn't a valid memory address.
Integer promotions also don't apply here because there is no expression with a type of lower rank than int
or unsigned int
.
The C language, does implicit conversion in assignments
I'll try to explain what the C standard says formally. We can start by reading 6.5.16.1/2:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
Simple assignment meaning =
, as opposed to the various compound assignments like for example +=
.
The above mentioned conversion only happens if the assignment is a valid form. There's a list of all forms of valid assignments (you don't need to read it, I'm including it here for the sake of completeness):
6.5.16.1 Simple assignment
Constraints
One of the following shall hold:
- the left operand has atomic, qualified, or unqualified arithmetic type, and the right has
arithmetic type;- the left operand has an atomic, qualified, or unqualified version of a structure or union type
compatible with the type of the right;- the left operand has atomic, qualified, or unqualified pointer type, and (considering the type
the left operand would have after lvalue conversion) both operands are pointers to qualified
or unqualified versions of compatible types, and the type pointed to by the left has all the
qualifiers of the type pointed to by the right;- the left operand has atomic, qualified, or unqualified pointer type, and (considering the type
the left operand would have after lvalue conversion) one operand is a pointer to an object type,
and the other is a pointer to a qualified or unqualified version of void, and the type pointed to
by the left has all the qualifiers of the type pointed to by the right;- the left operand is an atomic, qualified, or unqualified pointer, and the right is a null pointer
constant; or- the left operand has type atomic, qualified, or unqualified _Bool, and the right is a pointer.
If a type of assignment is not on that list, it is a "constraint violation", meaning invalid C, and the compiler must issue a message about it.
If an assignment is on that list, the right operand is implicitly converted to the type of the left operand. In your case long y = x;
, it fits the first bullet in the list: the left operand is an arithmetic type (integer or float) and the right operand is also arithmetic type. So the int
operand x
gets converted to long
upon assignment.
Regarding qualifiers:
All the stuff about "qualified type" refers to const
etc type qualifiers. During assignment, something called lvalue conversion occurs. 6.5.16/3:
The type of an assignment expression is the type the left operand would have
after lvalue conversion.
Not very helpful if you don't know what an "lvalue conversion" is. The formal definition is found in 6.3.2.1/2, but it's equally unhelpful for beginners. To summarize it in simple terms, lvalue conversion means that it doesn't matter what qualifiers (const
, volatile
etc) the right operand has, it gets converted to have the same qualifiers as the left operand. The term "lvalue" actually originates from "left value of the assignment operator".
Will implicit conversions lose information?
The first article talks about promotions, which are a specific type of implicit conversion. There are other types of conversions out there that are also implicit conversions but aren't promotions. A promotion is a specific type of implicit conversion and it can't lose information as you are always going to a wider type, i.e. a type where all the values representable by the type being promoted are representable by the promoted to type (int -> long long
for example)
Other implicit conversions include: going from signed to unsigned, narrowing conversions, floating point to integer conversions. These conversion may lose information unlike promotions.
How you avoid implicit conversion from short to integer during addition?
How you avoid implicit conversion from short to integer during addition?
You don't.
C has no arithmetic operations on integer types narrower than int
and unsigned int
. There is no +
operator for type short
.
Whenever an expression of type short
is used as the operand of an arithmetic operator, it is implicitly converted to int
.
For example:
short s = 1;
s = s + s;
In s + s
, s
is promoted from short
to int
and the addition is done in type int
. The assignment then implicitly converts the result of the addition from int
to short
.
Some compilers might have an option to enable a warning for the narrowing conversion from int
to short
, but there's no way to avoid it.
Implicit conversion in C89 and C99?
If the book says that implicit conversions happen only under those conditions, it is wrong. In integer arithmetic operations (and some others), operands with rank less than int
or unsigned int
are converted at least to int
or unsigned int
. (The formal rules have additional finicky details.) So, in b * c
, the short
operands b
and c
are promoted to int
, and the result type is int
. The mathematical result, 1,500,000, fits in an int
in your C implementation, so there is no overflow.
Implicit type conversion in C
No, in the case of the equality operator, the "usual arithmetic conversions" occur, which start off:
- First, if the corresponding real type of either operand is
long double
, the other operand is converted, without change of type
domain, to a type whose corresponding real type islong double
.- Otherwise, if the corresponding real type of either operand is
double
, the other operand is converted, without change of type
domain, to a type whose corresponding real type isdouble
.- Otherwise, if the corresponding real type of either operand is
float
, the other operand is converted, without change of type
domain, to a type whose corresponding real type isfloat
.
This last case applies here: i_value
is converted to float
.
The reason that you can see an odd result from the comparison, despite this, is because of this caveat to the usual arithmetic conversions:
The values of floating operands and of the results of floating
expressions may be represented in greater precision and range than
that required by the type; the types are not changed thereby.
This is what is happening: the type of the converted i_value
is still float
, but in this expression your compiler is taking advantage of this latitude and representing it in greater precision than float
. This is typical compiler behaviour when compiling for 387-compatible floating point, because the compiler leaves temporary values on the floating point stack, which stores floating point numbers in an 80bit extended precision format.
If your compiler is gcc
, you can disable this additional precision by giving the -ffloat-store
command-line option.
Related Topics
Strongly Typed Using and Typedef
Deprecation of the Static Keyword... No More
Representing 128-Bit Numbers in C++
Constexpr If and Static_Assert
Why Is C++11'S Pod "Standard Layout" Definition the Way It Is
What Is the Worst Real-World Macros/Pre-Processor Abuse You'Ve Ever Come Across
How Does the Ampersand(&) Sign Work in C++
How to Search/Find and Replace in a Standard String
Why Is 'Std::Move' Named 'Std::Move'
C++: What Regex Library Should I Use