How disastrous is integer overflow in C++?
As pointed out by @Xeo in the comments (I actually brought it up in the C++ chat first):
Undefined behavior really means it and it can hit you when you least expect it.
The best example of this is here: Why does integer overflow on x86 with GCC cause an infinite loop?
On x86, signed integer overflow is just a simple wrap-around. So normally, you'd expect the same thing to happen in C or C++. However, the compiler can intervene - and use undefined behavior as an opportunity to optimize.
In the example taken from that question:
#include <iostream>
using namespace std;
int main(){
int i = 0x10000000;
int c = 0;
do{
c++;
i += i;
cout << i << endl;
}while (i > 0);
cout << c << endl;
return 0;
}
When compiled with GCC, GCC optimizes out the loop test and makes this an infinite loop.
Generally, How do I prevent integer overflow from happening in C language?
Whenever you declare an integer variable:
- Actually consider how large/small a number it will ever contain.
- Actually consider if it needs to be signed or unsigned. Unsigned is usually less problematic.
- Pick the smallest type of
intn_t
oruintn_t
types fromstdint.h
that will satisfy the above (or the...fast_t
etc flavours if you wish). - If needed, come up with integer constants that contain the maximum and/or minimum value the variable will hold and check against those whenever you do arithmetic.
That is, don't just aimlessly spam int
all over your code without a thought.
Signed types can be problematic for other reasons than overflow too, namely whenever you need to do bitwise arithmetic. To avoid over/underflow and accidental signed bitwise arithmetic, you also need to know of the various implicit integer type promotion rules.
Is integer overflow going to get me hacked like buffer overflow or etc?
Not really, but any bug can of course be exploited if someone is aware of it - as you can see in almost every single computer game.
C/C++ unsigned integer overflow
It means the value "wraps around".
UINT_MAX + 1 == 0
UINT_MAX + 2 == 1
UINT_MAX + 3 == 2
.. and so on
As the link says, this is like the modulo operator: http://en.wikipedia.org/wiki/Modulo_operation
Integer overflow in intermediate arithmetic expression
According to the ISO C specification §6.2.5.9
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
This means that both the would-be positive and negative overflows that seem to occur in your addition and subtraction respectively are actually performed as signed int
so they are both well-defined. After the expression is evaluated, the result is then truncated back to an unsigned char
since that's the left-hand result type.
Integer overflow/underflow
I think what you're looking for is
signed char i = -1;
unsigned char j = i;
printf("%u\n", j);
In 8 bits, the signed number -1 "wraps around" to the unsigned value 255.
You asked about size_t
because, yes, it's an unsigned type, but it's typically 32 or even 64 bits. At those sizes, the number 255 is representable (and has the same representation) in both the signed and unsigned variants, so there isn't a negative number that corresponds to 255. But you can certainly see similar effects, using different values. For example, on a machine with 32-bit ints, this code:
unsigned int i = 4294967041;
int j = i;
printf("%d\n", j);
is likely to print -255. This value comes about because 2^32 - 255 = 4294967041.
Signed Integer value overflow in C++?
Because signed overflow/underflow are classified as undefined behavior, compilers are allowed to cheat and assume it can't happen (this came up during a Cppcon talk a year or two ago, but I forget the talk off the top of my head). Because you're doing the arithmetic and then checking the result, the optimizer gets to optimize away part of the check.
This is untested code, but you probably want something like the following:
if(b != 0) {
auto max_a = std::numeric_limits<int64_t>::max() / b;
if(max_a < a) {
throw std::runtime_error{"overflow"};
}
}
return a * b;
Note that this code doesn't handle underflow; if a * b
can be negative, this check won't work.
Per Godbolt, you can see your version has the check completely optimized away.
Is signed integer overflow undefined behaviour or implementation defined?
Both references are correct, but they do not address the same issue.
int a = UINT_MAX;
is not an instance of signed integer overflow, this definition involves a conversion from unsigned int
to int
with a value that exceeds the range of type int
. As quoted from the École polytechnique's site, the C Standard defines the behavior as implementation-defined.
#include <limits.h>
int main(){
int a = UINT_MAX; // implementation defined behavior
int b = INT_MAX + 1; // undefined behavior
return 0;
}
Here is the text from the C Standard:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than
_Bool
, if the value can be represented by the new type, it is unchanged.Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Some compilers have a command line option to change the behavior of signed arithmetic overflow from undefined behavior to implementation-defined: gcc
and clang
support -fwrapv
to force integer computations to be performed modulo the 232 or 264 depending on the signed type. This prevents some useful optimisations, but also prevents some counterintuitive optimisations that may break innocent looking code. See this question for some examples: What does -fwrapv do?
How is integer overflow exploitable?
It is definitely exploitable, but depends on the situation of course.
Old versions ssh had an integer overflow which could be exploited remotely. The exploit caused the ssh daemon to create a hashtable of size zero and overwrite memory when it tried to store some values in there.
More details on the ssh integer overflow: http://www.kb.cert.org/vuls/id/945216
More details on integer overflow: http://projects.webappsec.org/w/page/13246946/Integer%20Overflows
Why don't languages raise errors on integer overflow by default?
In C#, it was a question of performance. Specifically, out-of-box benchmarking.
When C# was new, Microsoft was hoping a lot of C++ developers would switch to it. They knew that many C++ folks thought of C++ as being fast, especially faster than languages that "wasted" time on automatic memory management and the like.
Both potential adopters and magazine reviewers are likely to get a copy of the new C#, install it, build a trivial app that no one would ever write in the real world, run it in a tight loop, and measure how long it took. Then they'd make a decision for their company or publish an article based on that result.
The fact that their test showed C# to be slower than natively compiled C++ is the kind of thing that would turn people off C# quickly. The fact that your C# app is going to catch overflow/underflow automatically is the kind of thing that they might miss. So, it's off by default.
I think it's obvious that 99% of the time we want /checked to be on. It's an unfortunate compromise.
Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.
In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).
Relevant quotes:
C99 6.2.6.1:3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
C99 6.2.6.2:2:
If the sign bit is one, the value shall be modified in one of the following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2N) (two’s complement);
— the sign bit has the value −(2N − 1) (one’s complement).
Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.
Related Topics
What Is the Purpose of a Declaration Like Int (X); or Int (X) = 10;
Why Can't I Access a Protected Member from an Instance of a Derived Class
Auto' as a Template Argument Placeholder for a Function Parameter
Boost Asio - How to Write Console Server
In C++ 11, How to Invoke an Arbitrary Callable Object
Strange Behavior of Const_Cast
Cut Set of a Graph, Boost Graph Library
How to Get a Non-Const C String Back from a C++ String
Forward Declaration with Vector of Class Type - Pointer to Incomplete Class Type Not Allowed
Using Boost Adaptors with C++11 Lambdas
Why Is a Class Allowed to Have a Static Member of Itself, But Not a Non-Static Member
How to Speed Up This Histogram of Lut Lookups
What Are the Operations Supported by Raw Pointer and Function Pointer in C/C++
Ordering of Using Namespace Std; and Includes
Initialize Base Class with No Default Constructor in Constructor of Derived Class