What platforms have something other than 8-bit char?
char
is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char
. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.
Another consideration is that POSIX mandates CHAR_BIT == 8
. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char
, that's their bad luck.
In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT
. If you want an exact 8 bit type, use int8_t
. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.
Will a `char` always-always-always have 8 bits?
Yes,
char
andbyte
are pretty much the same. A byte is the smallest addressable amount of memory, and so is achar
in C.char
always has size 1.From the spec, section 3.6 byte:
byte
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment
And section 3.7.1 character:
character
single-byte character
<C> bit representation that fits in a byteA
char
hasCHAR_BIT
bits. It could be any number (well, 8 or greater according to the spec), but is definitely most often 8. There are real machines with 16- and 32-bitchar
types, though.CHAR_BIT
is defined inlimits.h
.From the spec, section 5.2.4.2.1 Sizes of integer types
<limits.h>
:The values given below shall be replaced by constant expressions suitable for use in
#if
preprocessing directives. Moreover, except forCHAR_BIT
andMB_LEN_MAX
, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT
8
sizeof(char) == 1
. Always.From the spec, section 6.5.3.4 The
sizeof
operator, paragraph 3:When applied to an operand that has type
char
,unsigned char
, orsigned char
, (or a qualified version thereof) the result is 1.You can allocate as much memory as your system will let you allocate - there's nothing in the standard that defines how much that might be. You could imagine, for example, a computer with a cloud-storage backed memory allocation system - your allocatable memory might be practically infinite.
Here's the complete spec section 7.20.3.3 The
malloc
function:Synopsis
1
#include <stdlib.h>
void *malloc(size_t size);
Description
2 The
malloc
function allocates space for an object whose size is specified bysize
and whose value is indeterminate.Returns
3 The
malloc
function returns either a null pointer or a pointer to the allocated space.That's the entirety of the specification, so there's not really any limit you can rely on.
Is there any hosted C implementations which have CHAR_BIT 8?
I don't know if there was ever a post-standardisation version, but various Cray 64-bit vector supercomputers had a C compiler in which sizeof(long) == sizeof(double) == 1
, a.k.a. everything is 64 bits wide.
Are there machines, where sizeof(char) != 1, or at least CHAR_BIT 8?
It is always one in C99, section 6.5.3.4:
When applied to an operand that has
typechar
,unsigned char
, orsigned char
, (or a qualified version thereof)
the result is1
.
Edit: not part of your question, but for interest from Harbison and Steele's. C: A Reference Manual, Third Edition, Prentice Hall, 1991 (pre c99) p. 148:
A storage unit is taken to be the
amount of storage occupied by one
character; the size of an object of
typechar
is therefore1
.
Edit: In answer to your updated question, the following question and answer from Harbison and Steele is relevant (ibid, Ex. 4 of Ch. 6):
Is it allowable to have a C
implementation in which typechar
can
represent values ranging from
-2,147,483,648 through 2,147,483,647? If so, what would besizeof(char)
under that implementation? What would
be the smallest and largest ranges of
typeint
?
Answer (ibid, p. 382):
It is permitted (if wasteful) for an
implementation to use 32 bits to
represent typechar
. Regardless of
the implementation, the value ofsizeof(char)
is always 1.
While this does not specifically address a case where, say bytes are 8 bits and char
are 4 of those bytes (actually impossible with the c99 definition, see below), the fact that sizeof(char) = 1
always is clear from the c99 standard and Harbison and Steele.
Edit: In fact (this is in response to your upd 2 question), as far as c99 is concerned sizeof(char)
is in bytes, from section 6.5.3.4 again:
The
sizeof
operator yields the size
(in bytes) of its operand
so combined with the quotation above, bytes of 8 bits and char
as 4 of those bytes is impossible: for c99 a byte is the same as a char
.
In answer to your mention of the possibility of a 7 bit char
: this is not possible in c99. According to section 5.2.4.2.1 of the standard the minimum is 8:
Their implementation-defined values shall be equal or greater [my emphasis] in magnitude to those shown, with the same sign.
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
— minimum value for an object of type
signed char
SCHAR_MIN -127
— maximum value for an object of type
signed char
SCHAR_MAX +127
— maximum value for an object of type
unsigned char
UCHAR_MAX 255
— minimum value for an object of type
char
CHAR_MIN
see below— maximum value for an object of type
char
CHAR_MAX
see below[...]
If the value of an object of type
char
is treated as a signed integer when
used in an expression, the value ofCHAR_MIN
shall be the same as that ofSCHAR_MIN
and the value ofCHAR_MAX
shall be the same as that ofSCHAR_MAX
. Otherwise, the value ofCHAR_MIN
shall be0
and the value ofCHAR_MAX
shall be the same as that ofUCHAR_MAX
. The valueUCHAR_MAX
shall equal 2CHAR_BIT − 1.
Is CHAR_BIT ever 8?
TMS320C28x DSP from Texas Instruments has a byte with 16 bits.
Documentation for the compiler specifies CHAR_BIT
as 16 on page 101.
This appears to be a modern processor (currently being sold), compilers supporting C99 and C++03.
Bit size of GLib types and portability to more exotic (think 16 bit char) platforms
As Hans Passant said in his comment, glib guarantees that gint8
is 8-bits by not supporting platforms where signed char
is any other size. There are only two types of systems that have ever had C compiler implemenations where this requirement wasn't met.
The first is systems where the byte size is 9-bits. Today these are long obsolete, but systems like these had some of the earliest C compilers. It theory it could be possible for the compiler to emulate a restricted range 8-bit type as an extension, but it would still be 9-bits long in memory, and not really get you anything.
The second are word addressable systems, were the word size is either 16, 32 or 64 bits. In these computers the processor can only address memory at word boundaries. Address 0 is the first word, address 1 is the second word, and so on without any overlap between words. For the most part systems like these are obsolete now, but not anywhere as much as 9-bit byte machines. There's apparently still at least some use of word addressable processors in embedded systems.
In C compilers targeting word addressable systems the size of a byte is either the word size or 8 bits depending on the compiler. Some compilers gave a choice. Having word size bytes is the simple way to go. Implementing 8-bit bytes on other hand requires a fair bit of work. Not only does the compiler have to use multiple instructions to access the separate 8-bit values contained in each word, it also had to emulate a byte addressable pointer. This usually means char
pointers have a different size than int
pointers, as byte addressable pointers need more room to store both the address and a byte offset.
Needless to say the compilers that use word sized bytes wouldn't be supported by glib, while the ones using 8-bit bytes would at least be able implement gint8
. Though they still probably wouldn't be supported for a number of other reasons. The fact that sizeof(char *) > size(int *)
is true might be a problem.
I should also point out that there a few other long obsolete systems that, while having C compilers that used an 8-bit byte, still didn't have a type that meets the requirements of gint8
. These are the systems that used ones' complement or signed magnitude integers, meaning that signed char
ranged from -127 to 127 instead of the -128 to 127 range guaranteed by glib.
System where 1 byte != 8 bit? [duplicate]
On older machines, codes smaller than 8 bits were fairly common, but most of those have been dead and gone for years now.
C and C++ have mandated a minimum of 8 bits for char
, at least as far back as the C89 standard. [Edit: For example, C90, §5.2.4.2.1 requires CHAR_BIT
>= 8 and UCHAR_MAX
>= 255. C89 uses a different section number (I believe that would be §2.2.4.2.1) but identical content]. They treat "char" and "byte" as essentially synonymous [Edit: for example, CHAR_BIT
is described as: "number of bits for the smallest object that is not a bitfield (byte)".]
There are, however, current machines (mostly DSPs) where the smallest type is larger than 8 bits -- a minimum of 12, 14, or even 16 bits is fairly common. Windows CE does roughly the same: its smallest type (at least with Microsoft's compiler) is 16 bits. They do not, however, treat a char
as 16 bits -- instead they take the (non-conforming) approach of simply not supporting a type named char
at all.
Why are chars only 8 bits in size?
char
is guaranteed to be 1 byte by C++
standard. Keep in mind that it does not indicate that the size will be 8 bits, since not on every system the statement byte = 8 bits
is true. For the sake of explanation, assume that we're talking only about 8 bit bytes.
First of all, when you write:
8 bits = log2N
and thusN must equal 256
You are right. 8 bits can represent up to 256
different values, and the fact that Unicode consists of more characters than that has nothing to do with the problem. char
is not meant to represent every possible character out there. It is meant to represent one of 256
different values that can be interpreted to some range of printable or non printable characters.
However, on the Unicode table there are far more than 256 characters. And on my compiler, when I run the following lines of code:
char c = static_cast<char> (257);
cout << c;
I see an unknown character printed to the screen, but a character nonetheless.
But have you tried actually determinatig what does static_cast<char>(257)
return?
char c = static_cast<char>(257);
std::cout << static_cast<int>(c);
Will print 1
, and as we dive into Unicode (or ASCII) table, we can see that this value represents the Start of Heading character. It is a non printable character and printing it will result in an undefined character appearing on the console (need confirmation whether or not this is truly undefined).
For printing a wider range of characters, consider using wchar_t
(which is most likely to be 16
bits, thus it can cover a range of 65536 values) and std::wstring
to correspond to it.
Is the number of bits in a byte equal to the number of bits in a type char?
Yes. Both are equal to CHAR_BIT
*.
C standard defines CHAR_BIT
as: "number of bits for smallest object that is not a bit-field (byte)". c99 says explicitly: "A byte contains CHAR_BIT
bits."
"UCHAR_MAX
shall equal 2CHAR_BIT
- 1" — it means unsigned char
requires at least CHAR_BIT
bits (char_bits >= CHAR_BIT
).
sizeof(char) == 1
(single-byte character fits in a byte) i.e., type char
requires at most CHAR_BIT
bits (char_bits <= CHAR_BIT
).
From char_bits >= CHAR_BIT
and char_bits <= CHAR_BIT
follows that char_bits == CHAR_BIT
(no padding bits).
POSIX says it explicitly: "CHAR_BIT
Number of bits in a type char
."
*: If char
is signed and CHAR_BIT > 8
then (without the $6.2.6.2
quote below) it was not clear whether SCHAR_MIN..SCHAR_MAX
range covers all CHAR_BIT
bits. Though the name CHAR_BIT
communicates the intent clearly ("number of bits in char
").
c11 says ($6.2.6.2
in n1570 draft): "signed char
shall not have any padding bits. There shall be exactly one sign bit."
From $6.2.5.15
:
The implementation shall define
char
to have the same range,
representation, and behavior as eithersigned char
orunsigned char
it follow: All CHAR_BIT
bits are used to represent CHAR_MIN..CHAR_MAX
range (because both signed
and unsigned char
type use all bits).
For comparison, unlike char
; _Bool
may use less bits $6.7.2.1.4(122)
:
While the number of bits in a
_Bool
object is at leastCHAR_BIT
, the width (number of sign and value bits) of a_Bool
may be just 1 bit.
Related Topics
Storing C++ Template Function Definitions in a .Cpp File
Parse (Split) a String in C++ Using String Delimiter (Standard C++)
How Does Guaranteed Copy Elision Work
How to Print Out the Contents of a Vector
How to Implement Classic Sorting Algorithms in Modern C++
What Is the Proper Declaration of Main in C++
What Is "Rvalue Reference For *This"
What Are the Evaluation Order Guarantees Introduced by C++17
If You Shouldn't Throw Exceptions in a Destructor, How to Handle Errors in It
What Are the Advantages of List Initialization (Using Curly Braces)
What Are the Rules For Calling the Base Class Constructor
Array Index Out of Bound Behavior
What Is the Type of String Literals in C and C++
How to Pass a Class Member Function as a Callback
What Is an Undefined Reference/Unresolved External Symbol Error and How to Fix It