Why Is 'Char' Signed by Default in C++

Is char signed or unsigned by default?

The book is wrong. The standard does not specify if plain char is signed or unsigned.

In fact, the standard defines three distinct types: char, signed char, and unsigned char. If you #include <limits.h> and then look at CHAR_MIN, you can find out if plain char is signed or unsigned (if CHAR_MIN is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.

Do note that char is special in this way. If you declare a variable as int it is 100% equivalent to declaring it as signed int. This is always true for all compilers and architectures.

Why is 'char' signed by default in C++?

It isn't.

The signedness of a char that isn't either a signed char or unsigned char is implementation-defined. Many systems make it signed to match other types that are signed by default (like int), but it may be unsigned on some systems. (Say, if you pass -funsigned-char to GCC.)

What causes a char to be signed or unsigned when using gcc?

According to the C11 standard (read n1570), char can be signed or unsigned (so you actually have two flavors of C). What exactly it is is implementation specific.

Some processors and instruction set architectures or application binary interfaces favor a signed character (byte) type (e.g. because it maps nicely to some machine code instruction), other favor an unsigned one.

gcc has even some -fsigned-char or -funsigned-char option which you should almost never use (because changing it breaks some corner cases in calling conventions and ABIs) unless you recompile everything, including your C standard library.

You could use feature_test_macros(7) and <endian.h> (see endian(3)) or autoconf on Linux to detect what your system has.

In most cases, you should write portable C code, which does not depend upon those things. And you can find cross-platform libraries (e.g. glib) to help you in that.

BTW gcc -dM -E -x c /dev/null also gives __BYTE_ORDER__ etc, and if you want an unsigned 8 bit byte you should use <stdint.h> and its uint8_t (more portable and more readable). And standard limits.h defines CHAR_MIN and SCHAR_MIN and CHAR_MAX and SCHAR_MAX (you could compare them for equality to detect signed chars implementations), etc...

BTW, you should care about character encoding, but most systems today use UTF-8 everywhere. Libraries like libunistring are helpful. See also this and remember that practically speaking an Unicode character encoded in UTF-8 can span several bytes (i.e. char-s).

is char signed or unsigned by default on iOS?

In most cases, char is unsigned on ARM (for performance reasons) and signed on other platforms.

iOS differs from the normal convention for ARM, and char is signed by default. (In particular this differs from Android, where code compiled in the ndk defaults to unsigned char.)

This can be changed in xcode, there is a 'char' Type is unsigned option (which defaults to off). If this is changed to "yes", xcode will pass -funsigned-char to llvm. (Checked on xcode 5.0.2)

The reason why iOS differs is mentioned in iOS ABI Function Call Guide: ARM64 Function Calling Conventions, which says simply:

In iOS, as with other Darwin platforms, both char and wchar_t are
signed types.

Why char is unsigned and int is signed by default?

char is an unsigned type on your system, because the people who implemented your system chose that it should be unsigned. This choice varies between systems, and the C++ language specifies that either is allowed. You cannot assume one choice if you wish to write programs that work across different systems.

Note that char, signed char and unsigned char are all three distinct types regardless of whether char is signed or not. By contrast, int and signed int are two names for one and the same type.

Why don't the C or C++ standards explicitly define char as signed or unsigned?

Historical reasons, mostly.

Expressions of type char are promoted to int in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). On some systems, sign extension is the most efficient way to do this, which argues for making plain char signed.

On the other hand, the EBCDIC character set has basic characters with the high-order bit set (i.e., characters with values of 128 or greater); on EBCDIC platforms, char pretty much has to be unsigned.

The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; section 3.1.2.5 says:

Three types of char are specified: signed, plain, and unsigned. A
plain char may be represented as either signed or unsigned, depending
upon the implementation, as in prior practice. The type signed char
was introduced to make available a one-byte signed integer type on
those systems which implement plain char as unsigned. For reasons of
symmetry, the keyword signed is allowed as part of the type name of
other integral types.

Going back even further, an early version of the C Reference Manual from 1975 says:

A char object may be used anywhere an int may be. In all cases the
char is converted to an int by propagating its sign through the upper
8 bits of the resultant integer. This is consistent with the two’s
complement representation used for both characters and integers.
(However, the sign-propagation feature disappears in other
implementations.)

This description is more implementation-specific than what we see in later documents, but it does acknowledge that char may be either signed or unsigned. On the "other implementations" on which "the sign-propagation disappears", the promotion of a char object to int would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. (The language didn't yet have the signed or unsigned keyword.)

C's immediate predecessor was a language called B. B was a typeless language, so the question of char being signed or unsigned did not apply. For more information about the early history of C, see the late Dennis Ritchie's ~~home page~~, now moved here.

As for what's happening in your code (applying modern C rules):

char c = 0xff;
bool b = 0xff == c;

If plain char is unsigned, then the initialization of c sets it to (char)0xff, which compares equal to 0xff in the second line. But if plain char is signed, then 0xff (an expression of type int) is converted to char -- but since 0xff exceeds CHAR_MAX (assuming CHAR_BIT==8), the result is implementation-defined. In most implementations, the result is -1. In the comparison 0xff == c, both operands are converted to int, making it equivalent to 0xff == -1, or 255 == -1, which is of course false.

Another important thing to note is that unsigned char, signed char, and (plain) char are three distinct types. char has the same representation as either unsigned char or signed char; it's implementation-defined which one it is. (On the other hand, signed int and int are two names for the same type; unsigned int is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plain int is signed or unsigned.))

Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations.

What does it mean for a char to be signed?

It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.

In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.

Back then, were char variables in C declared as unsigned by default?

I think using int instead of char is only justifiable if c is modified
with unsigned because a signed char won't be able to hold the value of
EOF which is -1.

Who says that EOF is -1? It is specified to be negative, but it doesn't have to be -1.

In any case, you're missing the point. Signedness notwithstanding, getchar() needs to return a type that can represent more values than char can, because it needs to provide, in one way or another, for every char value, plus at least one value that is distinguishable from all the others, for use as EOF.

Were char variables previously unsigned by default? And if so, then
why did they alter it?

No. But in C89 and pre-standard C, functions could be called without having first been declared, and the expected return type in such cases was int. This is among the reasons that so many of the standard library functions return int.