Is Signed Integer Overflow Still Undefined Behavior in C++

Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.

In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).

Relevant quotes:

C99 6.2.6.1:3:

Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.

C99 6.2.6.2:2:

If the sign bit is one, the value shall be modified in one of the following ways:

— the corresponding value with sign bit 0 is negated (sign and magnitude);

— the sign bit has the value −(2N) (two’s complement);

— the sign bit has the value −(2N − 1) (one’s complement).


Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.

Is signed integer overflow still undefined behavior in C++?

is still overflow of these types an undefined behavior?

Yes. Per Paragraph 5/4 of the C++11 Standard (regarding any expression in general):

If during the evaluation of an expression, the result is not mathematically defined or not in the range of
representable values for its type, the behavior is undefined. [...]

The fact that a two's complement representation is used for those signed types does not mean that arithmetic modulo 2^n is used when evaluating expressions of those types.

Concerning unsigned arithmetic, on the other hand, the Standard explicitly specifies that (Paragraph 3.9.1/4):

Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number
of bits in the value representation of that particular size of integer

This means that the result of an unsigned arithmetic operation is always "mathematically defined", and the result is always within the representable range; therefore, 5/4 does not apply. Footnote 46 explains this:

46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the
resulting unsigned integer type.

Is signed integer overflow undefined behaviour or implementation defined?

Both references are correct, but they do not address the same issue.

int a = UINT_MAX; is not an instance of signed integer overflow, this definition involves a conversion from unsigned int to int with a value that exceeds the range of type int. As quoted from the École polytechnique's site, the C Standard defines the behavior as implementation-defined.

#include <limits.h>

int main(){
int a = UINT_MAX; // implementation defined behavior
int b = INT_MAX + 1; // undefined behavior
return 0;
}

Here is the text from the C Standard:

6.3.1.3 Signed and unsigned integers

  1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

  2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

  3. Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Some compilers have a command line option to change the behavior of signed arithmetic overflow from undefined behavior to implementation-defined: gcc and clang support -fwrapv to force integer computations to be performed modulo the 232 or 264 depending on the signed type. This prevents some useful optimisations, but also prevents some counterintuitive optimisations that may break innocent looking code. See this question for some examples: What does -fwrapv do?

Why/how does gcc compile the undefined behaviour in this signed-overflow test so it works on x86 but not ARM64?

Signed overflow is Undefined Behaviour in ISO C. You can't reliably cause it and then check if it happened.

In the expression (x + y) > y;, the compiler is allowed to assume that x+y doesn't overflow (because that would be UB). Therefore, it optimizes down to checking x > 0. (Yes, really, gcc does this even at -O0).

This optimization is new in gcc8. It's the same on x86 and AArch64; you must have used different GCC versions on AArch64 and x86. (Even at -O3, gcc7.x and earlier (intentionally?) miss this optimization. clang7.0 doesn't do it either. They actually do a 32-bit add and compare. They also miss optimizing tadd_ok to return 1, or to add and checking the overflow flag (V on ARM, OF on x86). Clang's optimized asm is an interesting mix of >>31, OR and one XOR operation, but -fwrapv actually changes that asm so it's probably not doing a full overflow check.)

You could say that gcc8 "breaks" your code, but really it was already broken as far as being legal / portable ISO C. gcc8 just revealed that fact.


To see it more clearly, lets isolate just that expression into one function. gcc -O0 compiles each statement separately anyway, so the information that this only runs when x<0 doesn't affect the -O0 code-gen for this statement in your tadd_ok function.

// compiles to add and checking the carry flag, or equivalent
int unsigned_overflow_test(unsigned x, unsigned y) {
return (x+y) >= y; // unsigned overflow is well-defined as wrapping.
}

// doesn't work because of UB.
int signed_overflow_expression(int x, int y) {
return (x+y) > y;
}

On the Godbolt compiler explorer with AArch64 GCC8.2 -O0 -fverbose-asm:

signed_overflow_expression:
sub sp, sp, #16 //,, // make a stack fram
str w0, [sp, 12] // x, x // spill the args
str w1, [sp, 8] // y, y
// end of prologue

// instructions that implement return (x+y) > y; as return x > 0
ldr w0, [sp, 12] // tmp94, x
cmp w0, 0 // tmp94,
cset w0, gt // tmp95, // w0 = (x>0) ? 1 : 0
and w0, w0, 255 // _1, tmp93 // redundant

// epilogue
add sp, sp, 16 //,,
ret

GCC -ftree-dump-original or -optimized will even turn its GIMPLE back into C-like code with this optimization done (from the Godbolt link):

;; Function signed_overflow_expression (null)
;; enabled by -tree-original

{
return x > 0;
}

Unfortunately, even with -Wall -Wextra -Wpedantic, there's no warning about a the comparison. It's not trivially true; it still depends on x.

The optimized asm is unsurprisingly cmp w0, 0 / cset w0, gt / ret. The AND with 0xff is redundant. cset is an alias of csinc, using the zero-register as both sources. So it will produce 0 / 1. With other registers, the general case of csinc is a conditional select and increment of any 2 registers.

Anyway, cset is AArch64's equivalent of x86 setcc, for turning a flag condition into a bool in a register.


If you want your code to work as written, you'd need to compile with -fwrapv to make it well-defined behaviour in the variant of C that -fwrapv makes GCC implement. The default is -fstrict-overflow, like the ISO C standard.

If you want to check for signed overflow in modern C, you need to write checks that detect overflow without actually causing it. This is harder, annoying, and a point of contention between compiler writers and (some) developers. They argue that the language rules around undefined behaviour weren't meant to be used as an excuse to "gratuitously break" code when compiling for target machines where it would make sense in asm. But modern compilers mostly only implement ISO C (with some extensions and extra defined behaviour), even when compiling for target architectures like x86 and ARM where signed integers have no padding (and thus wrap just fine), and don't trap on overflow.

So you could say "shots fired" in that war, with the change in gcc8.x to actually "breaking" unsafe code like this. :P

See Detecting signed overflow in C/C++ and How to check for signed integer overflow in C without undefined behaviour?


Since signed and unsigned addition are the same binary operation in 2's complement, you could maybe just cast to unsigned for the add, and cast back for a signed compare. That would make a version of your function that's safe on "normal" implementations: 2's complement, and casting between unsigned and int is just a reinterpret of the same bits.

This can't have UB, it just won't give the right answer on one's complement or sign/magnitude C implementations.

return  (int)((unsigned)x + (unsigned)y) > y;

This compiles (with gcc8.2 -O3 for AArch64) to

    add     w0, w0, w1            // x+y
cmp w0, w1 // x+y cmp y
cset w0, gt
ret

If you had written int sum = x+y as a separate C statement from return sum < y, this UB wouldn't be visible to gcc with optimization disabled. But as part of the same expression, even gcc with the default -O0 can see it.

Compile-time-visible UB is all kinds of bad. In this case, only certain ranges of inputs would produce UB, so the compiler assumes it doesn't happen. If unconditional UB is seen on a path of execution, an optimizing compiler can assume that path never happens. (In a function with no branching, it could assume the function is never called, and compile it to a single illegal instruction.) See Does the C++ standard allow for an uninitialized bool to crash a program? for more about compile-time-visible UB.

(-O0 doesn't mean "no optimization", it means no extra optimization besides what's already necessary to transform through gcc's internal representations on the way to asm for whatever target platform. @Basile Starynkevitch explains in
Disable all optimization options in GCC)

Some other compilers may "turn their brains off" even more with optimization disabled, and do something closer to transliterating C into asm, but gcc is not like that. For example, gcc still uses a multiplicative inverse for integer division by a constant at -O0. (Why does GCC use multiplication by a strange number in implementing integer division?) All 3 other major x86 compilers (clang/ICC/MSVC) use div.

Where in the C99 standard does it say that signed integer overflow is undefined behavior?

I don't have a copy of C99, but in the C11 standard this text appears in Section 6.5, paragraph 5:

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.

Which would seem to be a catch-all for any overflow; the text about unsigned integers then becomes a special-case above 6.5 ¶ 5.

Signed overflow in C++ and undefined behaviour (UB)

Compilers do assume that a valid C++ program does not contain UB. Consider for example:

if (x == nullptr) {
*x = 3;
} else {
*x = 5;
}

If x == nullptr then dereferencing it and assigning a value is UB. Hence the only way this could end in a valid program is when x == nullptr will never yield true and the compiler can assume under the as if rule, the above is equivalent to:

*x = 5;

Now in your code

int result = 0;
int factor = 1;
for (...) { // Loop until factor overflows but not more
result = ...
factor *= 10;
}
return result;

The last multiplication of factor cannot happen in a valid program (signed overflow is undefined). Hence also the assignment to result cannot happen. As there is no way to branch before the last iteration also the previous iteration cannot happen. Eventually, the part of code that is correct (i.e., no undefined behaviour ever happens) is:

// nothing :(

Does integer overflow cause undefined behavior because of memory corruption?

You misunderstand the reason for undefined behavior. The reason is not memory corruption around the integer - it will always occupy the same size which integers occupy - but the underlying arithmetics.

Since signed integers are not required to be encoded in 2's complement, there can not be specific guidance on what is going to happen when they overflow. Different encoding or CPU behavior can cause different outcomes of overflow, including, for example, program kills due to traps.

And as with all undefined behavior, even if your hardware uses 2's complement for its arithmetic and has defined rules for overflow, compilers are not bound by them. For example, for a long time GCC optimized away any checks which would only come true in a 2's-complement environment. For instance, if (x > x + 1) f() is going to be removed from optimized code, as signed overflow is undefined behavior, meaning it never happens (from compiler's view, programs never contain code producing undefined behavior), meaning x can never be greater than x + 1.



Related Topics



Leave a reply



Submit