What Happens When I Mix Signed and Unsigned Types

what happens when i mix signed and unsigned types ?

In simple terms, if you mix types of the same rank (in the sequence of int, long int, long long int), the unsigned type "wins" and the calculations are performed within that unsigned type. The result is of the same unsigned type.

If you mix types of different rank, the higher-ranked type "wins", if it can represent all values of lower-ranked type. The calculations are performed within that type. The result is of that type.

Finally, if the higher-ranked type cannot represent all values of lower-ranked type, then the unsigned version of the higher ranked type is used. The result is of that type.

In your case you mixed types of the same rank (int and unsigned int), which means that the whole expression is evaluated within unsigned int type. The expression, as you correctly stated, is now 10 - 4294967254 (for 32 bit int). Unsigned types obey the rules of modulo arithmetic with 2^32 (4294967296) as the modulo. If you carefully calculate the result (which can be expressed arithmetically as 10 - 4294967254 + 4294967296), it will turn out as the expected 52.

Unreasonable behavior on mixing signed and unsigned integers

In

 /*unsigned*/a+/*signed*/b

b will get converted to unsigned because of usual arithmetic conversions (6.3.1.8).

The conversion will be done by repeatedly adding or subtracting one more than
UINT_MAX (6.3.1.3p2) , which for unsigned == uint32_t means by adding 2^32 (one time)

For reference where unsigned == uint32_t:

UINT_MAX+1 == 2^32 == 4294967296

For (unsiged)a==6 and (signed)b==-20, you get:

2^32-20 + 6 == 4294967282  (<= UINT_MAX)

For (unsiged)a==6 (signed)b==-1, you get:

2^32-1 + 6 == 4294967301 (>UINT_MAX)

Now because this result is larger than UINT_MAX, it'll wrap-around into 5 (which you get by subtracting UINT_MAX+1, which how the wraparound is defined to happen (6.3.1.3p2)).

On two's complement architectures these rules basically translate to a trivial add, which means you could just as well do the operation with the signed representations (6-20==-14; 6-1==5) and then reinterpret the result as unsigned (seems like a more straightforward way to get the 5 in the 2nd case, doesn't it), but it's still good to know the rules because signed overflow is undefined in C (C!=assembly), which gives your compiler a lot of leeway to transform code into something you'd not expect if you assumed straightforward C to assembly mapping.

Expression involving signed and unsigned types in c++

The variable is not converted to unsigned. It's value is converted to unsigned for use in the expression. That is to say, when you do this:

std::cout << u - a << std::endl;

A temporary, nameless unsigned int is created from a, which is then subtracted from u. It is as if you had done this:

std::cout << u - (unsigned int)a << std::endl;

or this:

unsigned int __nameless__ = a;
std::cout << u - __nameless__ << std::endl;

Except that the __nameless__ variable does not actually exist outside of that expression.

Why prefer signed over unsigned in C++?

Let me paraphrase the video, as the experts said it succinctly.

Andrei Alexandrescu:

  • No simple guideline.
  • In systems programming, we need integers of different sizes and signedness.
  • Many conversions and arcane rules govern arithmetic (like for auto), so we need to be careful.

Chandler Carruth:

  • Here's some simple guidelines:
    1. Use signed integers unless you need two's complement arithmetic or a bit pattern
    2. Use the smallest integer that will suffice.
    3. Otherwise, use int if you think you could count the items, and a 64-bit integer if it's even more than you would want to count.
  • Stop worrying and use tools to tell you when you need a different type or size.

Bjarne Stroustrup:

  • Use int until you have a reason not to.
  • Use unsigned only for bit patterns.
  • Never mix signed and unsigned

Wariness about signedness rules aside, my one-sentence take away from the experts:

Use the appropriate type, and when you don't know, use an int until you do know.

Why does C++ standard specify signed integer be cast to unsigned in binary operations with mixed signedness?

Casting from unsigned to signed results in implementation-defined behaviour if the value cannot be represented. Casting from signed to unsigned is always modulo two to the power of the unsigned's bitsize, so it is always well-defined.

The standard conversion is to the signed type if every possible unsigned value is representable in the signed type. Otherwise, the unsigned type is chosen. This guarantees that the conversion is always well-defined.



Notes

  1. As indicated in comments, the conversion algorithm for C++ was inherited from C to maintain compatibility, which is technically the reason it is so in C++.

  2. When this note was written, the C++ standard allowed three binary representations, including sign-magnitude and ones' complement. That's no longer the case, and there's every reason to believe that it won't be the case for C either in the reasonably bear future. I'm leaving the footnote as a historical relic, but it says nothing relevant to the current language.

    It has been suggested that the decision in the standard to define signed to unsigned conversions and not unsigned to signed conversion is somehow arbitrary, and that the other possible decision would be symmetric. However, the possible conversion are not symmetric.

    In both of the non-2's-complement representations contemplated by the standard, an n-bit signed representation can represent only 2n−1 values, whereas an n-bit unsigned representation can represent 2n values. Consequently, a signed-to-unsigned conversion is lossless and can be reversed (although one unsigned value can never be produced). The unsigned-to-signed conversion, on the other hand, must collapse two different unsigned values onto the same signed result.

    In a comment, the formula sint = uint > sint_max ? uint - uint_max : uint is proposed. This coalesces the values uint_max and 0; both are mapped to 0. That's a little weird even for non-2s-complement representations, but for 2's-complement it's unnecessary and, worse, it requires the compiler to emit code to laboriously compute this unnecessary conflation. By contrast the standard's signed-to-unsigned conversion is lossless and in the common case (2's-complement architectures) it is a no-op.

Is using an unsigned rather than signed int more likely to cause bugs? Why?

Some of the answers here mention the surprising promotion rules between signed and unsigned values, but that seems more like a problem relating to mixing signed and unsigned values, and doesn't necessarily explain why signed variables would be preferred over unsigned outside of mixing scenarios.

In my experience, outside of mixed comparisons and promotion rules, there are two primary reasons why unsigned values are bug magnets as follows.

Unsigned values have a discontinuity at zero, the most common value in programming

Both unsigned and signed integers have a discontinuities at their minimum and maximum values, where they wrap around (unsigned) or cause undefined behavior (signed). For unsigned these points are at zero and UINT_MAX. For int they are at INT_MIN and INT_MAX. Typical values of INT_MIN and INT_MAX on system with 4-byte int values are -2^31 and 2^31-1, and on such a system UINT_MAX is typically 2^32-1.

The primary bug-inducing problem with unsigned that doesn't apply to int is that it has a discontinuity at zero. Zero, of course, is a very common value in programs, along with other small values like 1,2,3. It is common to add and subtract small values, especially 1, in various constructs, and if you subtract anything from an unsigned value and it happens to be zero, you just got a massive positive value and an almost certain bug.

Consider code iterates over all values in a vector by index except the last0.5:

for (size_t i = 0; i < v.size() - 1; i++) { // do something }

This works fine until one day you pass in an empty vector. Instead of doing zero iterations, you get v.size() - 1 == a giant number1 and you'll do 4 billion iterations and almost have a buffer overflow vulnerability.

You need to write it like this:

for (size_t i = 0; i + 1 < v.size(); i++) { // do something }

So it can be "fixed" in this case, but only by carefully thinking about the unsigned nature of size_t. Sometimes you can't apply the fix above because instead of a constant one you have some variable offset you want to apply, which may be positive or negative: so which "side" of the comparison you need to put it on depends on the signedness - now the code gets really messy.

There is a similar issue with code that tries to iterate down to and including zero. Something like while (index-- > 0) works fine, but the apparently equivalent while (--index >= 0) will never terminate for an unsigned value. Your compiler might warn you when the right hand side is literal zero, but certainly not if it is a value determined at runtime.

Counterpoint

Some might argue that signed values also have two discontinuities, so why pick on unsigned? The difference is that both discontinuities are very (maximally) far away from zero. I really consider this a separate problem of "overflow", both signed and unsigned values may overflow at very large values. In many cases overflow is impossible due to constraints on the possible range of the values, and overflow of many 64-bit values may be physically impossible). Even if possible, the chance of an overflow related bug is often minuscule compared to an "at zero" bug, and overflow occurs for unsigned values too. So unsigned combines the worst of both worlds: potentially overflow with very large magnitude values, and a discontinuity at zero. Signed only has the former.

Many will argue "you lose a bit" with unsigned. This is often true - but not always (if you need to represent differences between unsigned values you'll lose that bit anyways: so many 32-bit things are limited to 2 GiB anyways, or you'll have a weird grey area where say a file can be 4 GiB, but you can't use certain APIs on the second 2 GiB half).

Even in the cases where unsigned buys you a bit: it doesn't buy you much: if you had to support more than 2 billion "things", you'll probably soon have to support more than 4 billion.

Logically, unsigned values are a subset of signed values

Mathematically, unsigned values (non-negative integers) are a subset of signed integers (just called _integers).2. Yet signed values naturally pop out of operations solely on unsigned values, such as subtraction. We might say that unsigned values aren't closed under subtraction. The same isn't true of signed values.

Want to find the "delta" between two unsigned indexes into a file? Well you better do the subtraction in the right order, or else you'll get the wrong answer. Of course, you often need a runtime check to determine the right order! When dealing with unsigned values as numbers, you'll often find that (logically) signed values keep appearing anyways, so you might as well start of with signed.

Counterpoint

As mentioned in footnote (2) above, signed values in C++ aren't actually a subset of unsigned values of the same size, so unsigned values can represent the same number of results that signed values can.

True, but the range is less useful. Consider subtraction, and unsigned numbers with a range of 0 to 2N, and signed numbers with a range of -N to N. Arbitrary subtractions result in results in the range -2N to 2N in _both cases, and either type of integer can only represent half of it. Well it turns out that the region centered around zero of -N to N is usually way more useful (contains more actual results in real world code) than the range 0 to 2N. Consider any of typical distribution other than uniform (log, zipfian, normal, whatever) and consider subtracting randomly selected values from that distribution: way more values end up in [-N, N] than [0, 2N] (indeed, resulting distribution is always centered at zero).

64-bit closes the door on many of the reasons to use unsigned values as numbers

I think the arguments above were already compelling for 32-bit values, but the overflow cases, which affect both signed and unsigned at different thresholds, do occur for 32-bit values, since "2 billion" is a number that can exceeded by many abstract and physical quantities (billions of dollars, billions of nanoseconds, arrays with billions of elements). So if someone is convinced enough by the doubling of the positive range for unsigned values, they can make the case that overflow does matter and it slightly favors unsigned.

Outside of specialized domains 64-bit values largely remove this concern. Signed 64-bit values have an upper range of 9,223,372,036,854,775,807 - more than nine quintillion. That's a lot of nanoseconds (about 292 years worth), and a lot of money. It's also a larger array than any computer is likely to have RAM in a coherent address space for a long time. So maybe 9 quintillion is enough for everybody (for now)?

When to use unsigned values

Note that the style guide doesn't forbid or even necessarily discourage use of unsigned numbers. It concludes with:

Do not use an unsigned type merely to assert that a variable is non-negative.

Indeed, there are good uses for unsigned variables:

  • When you want to treat an N-bit quantity not as an integer, but simply a "bag of bits". For example, as a bitmask or bitmap, or N boolean values or whatever. This use often goes hand-in-hand with the fixed width types like uint32_t and uint64_t since you often want to know the exact size of the variable. A hint that a particular variable deserves this treatment is that you only operate on it with with the bitwise operators such as ~, |, &, ^, >> and so on, and not with the arithmetic operations such as +, -, *, / etc.

    Unsigned is ideal here because the behavior of the bitwise operators is well-defined and standardized. Signed values have several problems, such as undefined and unspecified behavior when shifting, and an unspecified representation.

  • When you actually want modular arithmetic. Sometimes you actually want 2^N modular arithmetic. In these cases "overflow" is a feature, not a bug. Unsigned values give you what you want here since they are defined to use modular arithmetic. Signed values cannot be (easily, efficiently) used at all since they have an unspecified representation and overflow is undefined.


0.5 After I wrote this I realized this is nearly identical to Jarod's example, which I hadn't seen - and for good reason, it's a good example!

1 We're talking about size_t here so usually 2^32-1 on a 32-bit system or 2^64-1 on a 64-bit one.

2 In C++ this isn't exactly the case because unsigned values contain more values at the upper end than the corresponding signed type, but the basic problem exists that manipulating unsigned values can result in (logically) signed values, but there is no corresponding issue with signed values (since signed values already include unsigned values).

Is it safe to bind an unsigned int to a signed int reference?

References can't bind to objects with different type directly. Given const int& s = u;, u is implicitly converted to int firstly, which is a temporary, a brand-new object and then s binds to the temporary int. (Lvalue-references to const (and rvalue-references) could bind to temporaries.) The lifetime of the temporary is prolonged to the lifetime of s, i.e. it'll be destroyed when get out of main.



Related Topics



Leave a reply



Submit