Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.
In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).
Relevant quotes:
C99 6.2.6.1:3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
C99 6.2.6.2:2:
If the sign bit is one, the value shall be modified in one of the following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2N) (two’s complement);
— the sign bit has the value −(2N − 1) (one’s complement).
Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.
Is signed integer overflow undefined behaviour or implementation defined?
Both references are correct, but they do not address the same issue.
int a = UINT_MAX;
is not an instance of signed integer overflow, this definition involves a conversion from unsigned int
to int
with a value that exceeds the range of type int
. As quoted from the École polytechnique's site, the C Standard defines the behavior as implementation-defined.
#include <limits.h>
int main(){
int a = UINT_MAX; // implementation defined behavior
int b = INT_MAX + 1; // undefined behavior
return 0;
}
Here is the text from the C Standard:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than
_Bool
, if the value can be represented by the new type, it is unchanged.Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Some compilers have a command line option to change the behavior of signed arithmetic overflow from undefined behavior to implementation-defined: gcc
and clang
support -fwrapv
to force integer computations to be performed modulo the 232 or 264 depending on the signed type. This prevents some useful optimisations, but also prevents some counterintuitive optimisations that may break innocent looking code. See this question for some examples: What does -fwrapv do?
Signed integer overflow undefined behavior
Since a pull request was opened for this case, it seems that the reference to chapter 7.2 was wrong and it should have been 7.1 instead.
Still unsure about signed integer overflow in C++
[expr.pre.incr]/1
The expression ++x is equivalent to x+=1.
[expr.ass]/6
The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.
But x = x + 1
is not UB (integer promotion and narrowing integer conversion), so the original is well-formed. Therefore, GCC is wrong.
Does integer overflow cause undefined behavior because of memory corruption?
You misunderstand the reason for undefined behavior. The reason is not memory corruption around the integer - it will always occupy the same size which integers occupy - but the underlying arithmetics.
Since signed integers are not required to be encoded in 2's complement, there can not be specific guidance on what is going to happen when they overflow. Different encoding or CPU behavior can cause different outcomes of overflow, including, for example, program kills due to traps.
And as with all undefined behavior, even if your hardware uses 2's complement for its arithmetic and has defined rules for overflow, compilers are not bound by them. For example, for a long time GCC optimized away any checks which would only come true in a 2's-complement environment. For instance, if (x > x + 1) f()
is going to be removed from optimized code, as signed overflow is undefined behavior, meaning it never happens (from compiler's view, programs never contain code producing undefined behavior), meaning x
can never be greater than x + 1
.
Is signed integer overflow still undefined behavior in C++?
is still overflow of these types an undefined behavior?
Yes. Per Paragraph 5/4 of the C++11 Standard (regarding any expression in general):
If during the evaluation of an expression, the result is not mathematically defined or not in the range of
representable values for its type, the behavior is undefined. [...]
The fact that a two's complement representation is used for those signed types does not mean that arithmetic modulo 2^n is used when evaluating expressions of those types.
Concerning unsigned arithmetic, on the other hand, the Standard explicitly specifies that (Paragraph 3.9.1/4):
Unsigned integers, declared
unsigned
, shall obey the laws of arithmetic modulo 2^n where n is the number
of bits in the value representation of that particular size of integer
This means that the result of an unsigned arithmetic operation is always "mathematically defined", and the result is always within the representable range; therefore, 5/4 does not apply. Footnote 46 explains this:
46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the
resulting unsigned integer type.
C/C++ unsigned integer overflow
It means the value "wraps around".
UINT_MAX + 1 == 0
UINT_MAX + 2 == 1
UINT_MAX + 3 == 2
.. and so on
As the link says, this is like the modulo operator: http://en.wikipedia.org/wiki/Modulo_operation
Related Topics
Read File Line by Line Using Ifstream in C++
Templated Check For the Existence of a Class Member Function
Difference Between #Include ≪Filename≫ and #Include "Filename"
What Is the Most Effective Way For Float and Double Comparison
Is It a Good Idea to Typedef Pointers
How to Get the List of Files in a Directory Using C or C++
Accessing Inactive Union Member and Undefined Behavior
Is Segmentation Fault Actual Undefined Behavior When We Refer to a Non-Static Data-Member
Regular Cast Vs. Static_Cast Vs. Dynamic_Cast
What Is the Curiously Recurring Template Pattern (Crtp)
Why Use Apparently Meaningless Do-While and If-Else Statements in Macros
Is There a Difference Between Copy Initialization and Direct Initialization
How to Print a Double Value With Full Precision Using Cout
Sorting a Vector of Custom Objects