Why don't the C or C++ standards explicitly define char as signed or unsigned?
Historical reasons, mostly.
Expressions of type char
are promoted to int
in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). On some systems, sign extension is the most efficient way to do this, which argues for making plain char
signed.
On the other hand, the EBCDIC character set has basic characters with the high-order bit set (i.e., characters with values of 128 or greater); on EBCDIC platforms, char
pretty much has to be unsigned.
The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; section 3.1.2.5 says:
Three types of char are specified:
signed
, plain, andunsigned
. A
plainchar
may be represented as either signed or unsigned, depending
upon the implementation, as in prior practice. The typesigned char
was introduced to make available a one-byte signed integer type on
those systems which implement plain char as unsigned. For reasons of
symmetry, the keywordsigned
is allowed as part of the type name of
other integral types.
Going back even further, an early version of the C Reference Manual from 1975 says:
A
char
object may be used anywhere anint
may be. In all cases the
char
is converted to anint
by propagating its sign through the upper
8 bits of the resultant integer. This is consistent with the two’s
complement representation used for both characters and integers.
(However, the sign-propagation feature disappears in other
implementations.)
This description is more implementation-specific than what we see in later documents, but it does acknowledge that char
may be either signed or unsigned. On the "other implementations" on which "the sign-propagation disappears", the promotion of a char
object to int
would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. (The language didn't yet have the signed
or unsigned
keyword.)
C's immediate predecessor was a language called B. B was a typeless language, so the question of char
being signed or unsigned did not apply. For more information about the early history of C, see the late Dennis Ritchie's home page, now moved here.
As for what's happening in your code (applying modern C rules):
char c = 0xff;
bool b = 0xff == c;
If plain char
is unsigned, then the initialization of c
sets it to (char)0xff
, which compares equal to 0xff
in the second line. But if plain char
is signed, then 0xff
(an expression of type int
) is converted to char
-- but since 0xff
exceeds CHAR_MAX (assuming CHAR_BIT==8
), the result is implementation-defined. In most implementations, the result is -1
. In the comparison 0xff == c
, both operands are converted to int
, making it equivalent to 0xff == -1
, or 255 == -1
, which is of course false.
Another important thing to note is that unsigned char
, signed char
, and (plain) char
are three distinct types. char
has the same representation as either unsigned char
or signed char
; it's implementation-defined which one it is. (On the other hand, signed int
and int
are two names for the same type; unsigned int
is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plain int
is signed or unsigned.))
Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations.
Why is 'char' signed by default in C++?
It isn't.
The signedness of a char
that isn't either a signed char
or unsigned char
is implementation-defined. Many systems make it signed to match other types that are signed by default (like int
), but it may be unsigned on some systems. (Say, if you pass -funsigned-char
to GCC.)
Is char signed or unsigned by default?
The book is wrong. The standard does not specify if plain char
is signed or unsigned.
In fact, the standard defines three distinct types: char
, signed char
, and unsigned char
. If you #include <limits.h>
and then look at CHAR_MIN
, you can find out if plain char
is signed
or unsigned
(if CHAR_MIN
is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.
Do note that char
is special in this way. If you declare a variable as int
it is 100% equivalent to declaring it as signed int
. This is always true for all compilers and architectures.
why is char's sign-ness not defined in C?
"Plain" char having unspecified signed-ness allows compilers to select whichever representation is more efficient for the target architecture: on some architectures, zero extending a one-byte value to the size of "int" requires less operations (thus making plain char 'unsigned'), while on others the instruction set makes sign-extending more natural, and plain char gets implemented as signed.
Why doesn't C++ accept signed or unsigned char for arrays of characters
"z1y2x3w4"
is const char[9]
and there is no implicit conversion from const char*
to const signed char*
.
You could use reinterpret_cast
const signed char * AnArrayOfStrings[] = {reinterpret_cast<const signed char *>("z1y2x3w4"),
reinterpret_cast<const signed char *>("Aname")};
Char vs unsigned char in conversion to int
It's implementation defined if a char
is signed or unsigned.
If char
is signed, then when being promoted to an int
it will be sign extended, so a negative value will keep its negative value after the promotion.
The leading 1
bits is how negative numbers are represented in two's complement systems, which is the most common way to handle negative numbers.
If, with your compiler, char
is signed (which it seems to be) then the initialization of c1
should generate a warning. If it doesn't then you need to enable more warnings.
Why is char neither signed or unsigned, but wchar_t is?
The char
s are all distinct types and can be overloaded
[basic.fundamental] / 1
[...] Plain
char
,signed char
, andunsigned char
are three distinct types,
collectively called narrow character types. [...]
wchar_t
is also a distinct type, but it cannot be qualified with signed
or unsigned
, which can only be used with the standard integer types.
[dcl.type] / 2
As a general rule, at most one type-specifier is allowed in
the complete decl-specifier-seq of a declaration or in a
type-specifier-seq or trailing-type-specifier-seq. The only exceptions
to this rule are the following:[...]
signed
orunsigned
can be combined withchar
,long
,short
, orint
.
[dcl.type.simple] / 2
[...] Table 9 summarizes the valid combinations of simple-type-specifiers and the types they specify.
The signedness of wchar_t
is implementation defined:
[basic.fundamental] / 5
[...] Type
wchar_t
shall have the same size, signedness, and alignment
requirements (3.11) as one of the other integral types, called its
underlying type.
is char signed or unsigned by default on iOS?
In most cases, char
is unsigned on ARM (for performance reasons) and signed on other platforms.
iOS differs from the normal convention for ARM, and char is signed by default. (In particular this differs from Android, where code compiled in the ndk defaults to unsigned char.)
This can be changed in xcode, there is a 'char' Type is unsigned
option (which defaults to off). If this is changed to "yes", xcode will pass -funsigned-char
to llvm. (Checked on xcode 5.0.2)
The reason why iOS differs is mentioned in iOS ABI Function Call Guide: ARM64 Function Calling Conventions, which says simply:
In iOS, as with other Darwin platforms, both char and wchar_t are
signed types.
Related Topics
Default Class Inheritance Access
Std::List<>::Sort()' - Why the Sudden Switch to Top-Down Strategy
*.H or *.Hpp for Your Class Definitions
In What Ways Do C++ Exceptions Slow Down Code When There Are No Exceptions Thown
What Is the Default Value for C++ Class Members
Tell Gdb to Skip Standard Files
How to Pre-Allocate Memory for a Std::String Object
Why Does VS Not Define the Alternative Tokens for Logical Operators
Connected Components in Opencv
Function Overloading Based on Value VS. Const Reference
Are Multiple Mutations of the Same Variable Within Initializer Lists Undefined Behavior Pre C++11
Which Boost Features Overlap with C++11
C++11 Lambda Implementation and Memory Model
How to Call a Function by Its Name (Std::String) in C++