Converting from Signed Char to Unsigned Char and Back Again

Converting from signed char to unsigned char and back again?

This is one of the reasons why C++ introduced the new cast style, which includes static_cast and reinterpret_cast

There's two things you can mean by saying conversion from signed to unsigned, you might mean that you wish the unsigned variable to contain the value of the signed variable modulo the maximum value of your unsigned type + 1. That is if your signed char has a value of -128 then CHAR_MAX+1 is added for a value of 128 and if it has a value of -1, then CHAR_MAX+1 is added for a value of 255, this is what is done by static_cast. On the other hand you might mean to interpret the bit value of the memory referenced by some variable to be interpreted as an unsigned byte, regardless of the signed integer representation used on the system, i.e. if it has bit value 0b10000000 it should evaluate to value 128, and 255 for bit value 0b11111111, this is accomplished with reinterpret_cast.

Now, for the two's complement representation this happens to be exactly the same thing, since -128 is represented as 0b10000000 and -1 is represented as 0b11111111 and likewise for all in between. However other computers (usually older architectures) may use different signed representation such as sign-and-magnitude or ones' complement. In ones' complement the 0b10000000 bitvalue would not be -128, but -127, so a static cast to unsigned char would make this 129, while a reinterpret_cast would make this 128. Additionally in ones' complement the 0b11111111 bitvalue would not be -1, but -0, (yes this value exists in ones' complement,) and would be converted to a value of 0 with a static_cast, but a value of 255 with a reinterpret_cast. Note that in the case of ones' complement the unsigned value of 128 can actually not be represented in a signed char, since it ranges from -127 to 127, due to the -0 value.

I have to say that the vast majority of computers will be using two's complement making the whole issue moot for just about anywhere your code will ever run. You will likely only ever see systems with anything other than two's complement in very old architectures, think '60s timeframe.

The syntax boils down to the following:

signed char x = -100;
unsigned char y;

y = (unsigned char)x;                    // C static
y = *(unsigned char*)(&x);               // C reinterpret
y = static_cast<unsigned char>(x);       // C++ static
y = reinterpret_cast<unsigned char&>(x); // C++ reinterpret

To do this in a nice C++ way with arrays:

jbyte memory_buffer[nr_pixels];
unsigned char* pixels = reinterpret_cast<unsigned char*>(memory_buffer);

or the C way:

unsigned char* pixels = (unsigned char*)memory_buffer;

Convert signed char[] to unsigned char[] in C

Why not use memcpy?

unsigned char uChars[count];
signed char sChars[count];

memcpy(uChars, sChars, count);

C: gcc implicitly converts signed char to unsigned char and vice versa?

See 6.3.1.3/3 in the C99 Standard

... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

So, if you don't get a signal (if your program doesn't stop) read the documentation for your compiler to understand what it does.

gcc documents the behaviour ( in http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation ) as

The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 6.3.1.3).

For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

Can I turn unsigned char into char and vice versa?

The short answer is yes if you use an explicit cast, but to explain it in detail, there are three aspects to look at:

1) Legality of the conversion
Converting between signed T* and unsigned T* (for some type T) in either direction is generally possible because the source type can first be converted to void * (this is a standard conversion, §4.10), and the void * can be converted to the destination type using an explicit static_cast (§5.2.9/13):

static_cast<unsigned char*>(static_cast<void *>(data_in))

This can be abbreviated (§5.2.10/7) as

reinterpret_cast<unsigned char *>(data_in)

because char is a standard-layout type (§3.9.1/7,8 and §3.9/9) and signedness does not change alignment (§3.9.1/1). It can also be written as a C-style cast:

(unsigned char *)(data_in)

Again, this works both ways, from unsigned* to signed* and back. There is also a guarantee that if you apply this procedure one way and then back, the pointer value (i.e. the address it's pointing to) won't have changed (§5.2.10/7).

All of this applies not only to conversions between signed char * and unsigned char *, but also to char */unsigned char * and char */signed char *, respectively. (char, signed char and unsigned char are formally three distinct types, §3.9.1/1.)

To be clear, it doesn't matter which of the three cast-methods you use, but you must use one. Merely passing the pointer will not work, as the conversion, while legal, is not a standard conversion, so it won't be performed implicitly (the compiler will issue an error if you try).

2) Well-definedness of the access to the values
What happens if, inside the function, you dereference the pointer, i.e. you perform *data_in to retrieve a glvalue for the underlying character; is this well-defined and legal? The relevant rule here is the strict-aliasing rule (§3.10/10):

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

[...]

a type that is the signed or unsigned type corresponding to the dynamic type of the object,

[...]

a char or unsigned char type.

Therefore, accessing a signed char (or char) through an unsigned char* (or char) and vice versa is not disallowed by this rule – you should be able to do this without problems.

3) Resulting values
After derefencing the type-converted pointer, will you be able to work with the value you get? It's important to bear in mind that the conversion and dereferencing of the pointer described above amounts to reinterpreting (not changing!) the bit pattern stored at the address of the character. So what happens when a bit pattern for a signed character is interpreted as that of an unsigned character (or vice versa)?

When going from unsigned to signed, the typical effect will be that for values between 0 and 128 nothing happens, and values above 128 become negative. Similar in reverse: When going from signed to unsigned, negative values will appear as values greater than 128.

But this behaviour isn't actually guaranteed by the Standard. The only thing the Standard guarantees is that for all three types, char, unsigned char and signed char, all bits (not necessarily 8, btw) are used for the value representation. So if you interpret one as the other, make a few copies and then store it back to the original location, you can be sure that there will be no information loss (as you required), but you won't necessarily know what the values actually mean (at least not in a fully portable way).

Why do `(char)~0` and `(unsigned char)~0` return values of different widths?

When passing a type smaller than int to a variadic function like printf, it get promoted to type int.

In the first case, you're passing char with value -1 whose representation (assuming 2's complement) is 0xff. This is promoted to an int with value -1 and representation 0xffffffff, so this is what is printed.

In the second case, you're passing an unsigned char with value 255 whose representation is 0xff. This is promoted to an int with value 255 and representation 0x000000ff, so this is what is printed (without the leading zeros).

c: type casting char values into unsigned short

The type char is not a "third" signedness. It is either signed char or unsigned char, and which one it is is implementation defined.

This is dictated by section 6.2.5p15 of the C standard:

The three types char , signed char , and unsigned char are
collectively called the character types. The implementation
shall define char to have the same range, representation, and
behavior as either signed char or unsigned char.

It appears that on your implementation, char is the same as signed char, so because the value is negative and because the destination type is unsigned it must be converted.

Section 6.3.1.3 dictates how conversion between integer types occur:

1 When a value with integer type is converted to another integer type
other than
_Bool ,if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.

3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or
an implementation-defined signal is raised.

Since the value 0x80 == -128 cannot be represented in an unsigned short the conversion in paragraph 2 occurs.

How to convert unsigned to signed char (and back) while preserving range in C?

I think you want:

unsigned char ConvertUnsigned(char n) {
    return (int)n + 128;
}

char ConvertSigned(unsigned char n) {
    return (int)n - 128;
}

assuming "127 converted to 0" is a typo.

Proper way to perform unsigned-signed conversion

I know that signed types overflow is undefined behaviour,

True, but does not apply here.

a += 140; is not signed integer overflow, not UB. That is like a = a + 140; a + 140 does not overflow when a is 8-bit signed char or unsigned char.

The issue is what happens when the sum a + 140 is out of char range and assigned to a char.

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. C17dr § 6.3.1.3 3

It is implementation defined behavior, when char is signed and 8-bit - to assign a value outside the char range.

Usually the implementation defined behavior is a wrap and fully defined so a += 140; is fine as is.

Alternatively the implementation defined behavior might have been to cap the value to the char range when char is signed.

char a = 42;
a += 140;
// Might act as if
a = max(min(a + 140, CHAR_MAX), CHAR_MIN);
a = 127;

To avoid implementation defined behavior, perform the + or - on a accessed as a unsigned char

*((unsigned char *)&a) += small_offset;

Or just use unsigned char a and avoid all this. unsigned char is defined to wrap.

Converting from Signed Char to Unsigned Char and Back Again