Ramifications of C++20 Requiring Two's Complement

Ramifications of C++20 requiring two's complement

One of the specific questions considered by the committee was what to do about -INT_MIN, and the results of that poll were:

addition / subtraction / multiplication and -INT_MIN overflow is currently undefined behavior, it should instead be:

4: wrap

6: wrap or trap

5: intermediate values are mathematical integers

14: status quo (remain undefined behavior)

This was explicitly considered and people felt that the best option was keeping it undefined behavior.

To clarify on "intermediate values are mathematical integers", there is a other part of the paper which clarifies that means that (int)a + (int)b > INT_MAX might be true.

Note that implementations are free to define specific behavior in these cases if they so choose. I don't know if any of them do.

Why not enforce 2's complement in C++?

(Edit: C++20 now imposes 2's complement representation, note that overflow of signed arithmetic is still undefined and shifts continue to have undefined and implementation defined behaviors in some cases.)

A major problem in defining something which isn't, is that compilers were built assuming that is undefined. Changing the standard won't change the compilers and reviewing those to find out where the assumption was made is a difficult task.
Even on 2 complement machine, you may have more variety than you think. Two examples: some don't have a sign preserving right shift, just a right shift which introduce zeros; a common feature in DSP is saturating arithmetic, there assigning an out of range value will clip it at the maximum, not just drop the high order bits.

Is there a standard binary representation of integer data types in c++20?

The sign bit is required to be the most significant bit (§[basic.fundamental]/3):

For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2^N has the same value of corresponding bits in its value representation.

Things only work this way if the sign bit is what would be the MSB in an unsigned.

This also requires that (for example) uint8_t x = -1; will set x to 0b11111111 (since -1 reduced modulo 2⁸ is 255). In fact, that's used as an example in the standard:

[Example: The value −1 of a signed integer type has the same representation as the largest value of the corresponding unsigned type. —end example]

As far as an offset representation goes, I believe it's considered impossible. The C++ standard refers to the C standard which requires (§6.2.6.2/1):

If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2^N-1, so that objects of that type shall be capable of representing values from 0 to 2^{N - 1} using a pure binary representation;

"using a pure binary representation" is at least normally interpreted as meaning a representation like:

b_Nb_N-1b_N-2...b₂b₁b₀.

I.e., where, if you count bits from 0 through N-1, each bit represents the corresponding power of 2.

What is “2's Complement”?

Two's complement is a clever way of storing integers so that common math problems are very simple to implement.

To understand, you have to think of the numbers in binary.

It basically says,

for zero, use all 0's.
for positive integers, start counting up, with a maximum of 2^{(number of bits - 1)}-1.
for negative integers, do exactly the same thing, but switch the role of 0's and 1's and count down (so instead of starting with 0000, start with 1111 - that's the "complement" part).

Let's try it with a mini-byte of 4 bits (we'll call it a nibble - 1/2 a byte).

0000 - zero
0001 - one
0010 - two
0011 - three
0100 to 0111 - four to seven

That's as far as we can go in positives. 2³-1 = 7.

For negatives:

1111 - negative one
1110 - negative two
1101 - negative three
1100 to 1000 - negative four to negative eight

Note that you get one extra value for negatives (1000 = -8) that you don't for positives. This is because 0000 is used for zero. This can be considered as Number Line of computers.

Distinguishing between positive and negative numbers

Doing this, the first bit gets the role of the "sign" bit, as it can be used to distinguish between nonnegative and negative decimal values. If the most significant bit is 1, then the binary can be said to be negative, where as if the most significant bit (the leftmost) is 0, you can say the decimal value is nonnegative.

"Sign-magnitude" negative numbers just have the sign bit flipped of their positive counterparts, but this approach has to deal with interpreting 1000 (one 1 followed by all 0s) as "negative zero" which is confusing.

"Ones' complement" negative numbers are just the bit-complement of their positive counterparts, which also leads to a confusing "negative zero" with 1111 (all ones).

You will likely not have to deal with Ones' Complement or Sign-Magnitude integer representations unless you are working very close to the hardware.

reversing two's complement for 18bit int

The almost-portable way (with assumption that negative integers are natively 2s-complement) is to simply inspect bit 17, and use that to conditionally mask in the sign bits:

constexpr SomeType sign_bits = ~SomeType{} << 18;
int regularNum = twosCompNum & 1<<17 ? twosCompNum | sign_bits : twosCompNum;

Note that this doesn't depend on the size of your int type.

FInd Two's Complement for Unsigned Integer

Whenever we work with one’s complement or two’s complement, we need to state what the word size is. If there are w bits in the word, then the one’s complement of a number x is obtained by subtracting x from the binary numeral made of w 1s. If w is 16, we use 1111111111111111₂, which is 65535. Then the one’s complement of x is 1111111111111111₂−x. Viewing x as a binary numeral (with at most w bits), whatever bits are on in x will be off in 1111111111111111₂−x, and whatever bits are off in x will be on in 1111111111111111₂−x. Hence, all the bits are complemented.

C has an operator for the one’s complement; ~x flips all the bits, so it produces the one’s complement of x.

The two’s complement of x is 2^w−x, by definition (except that the two’s complement of 0 is 0). 2^w equals one plus that binary numeral made of w 1s. For example, 2¹⁶ = 65535 + 1. Therefore, the two’s complement is one more than the one’s complement. Therefore the two’s complement of x is ~x + 1.

C also has an operator for the two’s complement, for unsigned integers. Unsigned arithmetic is defined to “wrap” modulo 2^w; whenever a regular arithmetic result is outside that range, it is brought back into that range by adding or subtracting 2^w as needed. The regular arithmetic negation of x would be negative (if x is not zero), so the computed result of -x is −x + 2^w = 2^w−x, which is the two’s complement of x.

Ramifications of C++20 Requiring Two's Complement