What Platforms Have Something Other Than 8-Bit Char

What platforms have something other than 8-bit char?

char is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.

Another consideration is that POSIX mandates CHAR_BIT == 8. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char, that's their bad luck.

In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT. If you want an exact 8 bit type, use int8_t. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.

Will a `char` always-always-always have 8 bits?

  1. Yes, char and byte are pretty much the same. A byte is the smallest addressable amount of memory, and so is a char in C. char always has size 1.

    From the spec, section 3.6 byte:

    byte

    addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

    And section 3.7.1 character:

    character

    single-byte character

    <C> bit representation that fits in a byte

  2. A char has CHAR_BIT bits. It could be any number (well, 8 or greater according to the spec), but is definitely most often 8. There are real machines with 16- and 32-bit char types, though. CHAR_BIT is defined in limits.h.

    From the spec, section 5.2.4.2.1 Sizes of integer types <limits.h>:

    The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

    — number of bits for smallest object that is not a bit-field (byte)

        CHAR_BIT                               8

  3. sizeof(char) == 1. Always.

    From the spec, section 6.5.3.4 The sizeof operator, paragraph 3:

    When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

  4. You can allocate as much memory as your system will let you allocate - there's nothing in the standard that defines how much that might be. You could imagine, for example, a computer with a cloud-storage backed memory allocation system - your allocatable memory might be practically infinite.

    Here's the complete spec section 7.20.3.3 The malloc function:

    Synopsis

    1 #include <stdlib.h>

       void *malloc(size_t size);

    Description

    2 The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

    Returns

    3 The malloc function returns either a null pointer or a pointer to the allocated space.

    That's the entirety of the specification, so there's not really any limit you can rely on.

Is there any hosted C implementations which have CHAR_BIT 8?

I don't know if there was ever a post-standardisation version, but various Cray 64-bit vector supercomputers had a C compiler in which sizeof(long) == sizeof(double) == 1, a.k.a. everything is 64 bits wide.

Are there machines, where sizeof(char) != 1, or at least CHAR_BIT 8?

It is always one in C99, section 6.5.3.4:

When applied to an operand that has
type char, unsigned char, or signed char, (or a qualified version thereof)
the result is 1.

Edit: not part of your question, but for interest from Harbison and Steele's. C: A Reference Manual, Third Edition, Prentice Hall, 1991 (pre c99) p. 148:

A storage unit is taken to be the
amount of storage occupied by one
character; the size of an object of
type char is therefore 1.

Edit: In answer to your updated question, the following question and answer from Harbison and Steele is relevant (ibid, Ex. 4 of Ch. 6):

Is it allowable to have a C
implementation in which type char can
represent values ranging from
-2,147,483,648 through 2,147,483,647? If so, what would be sizeof(char)
under that implementation? What would
be the smallest and largest ranges of
type int?

Answer (ibid, p. 382):

It is permitted (if wasteful) for an
implementation to use 32 bits to
represent type char. Regardless of
the implementation, the value of
sizeof(char) is always 1.

While this does not specifically address a case where, say bytes are 8 bits and char are 4 of those bytes (actually impossible with the c99 definition, see below), the fact that sizeof(char) = 1 always is clear from the c99 standard and Harbison and Steele.

Edit: In fact (this is in response to your upd 2 question), as far as c99 is concerned sizeof(char) is in bytes, from section 6.5.3.4 again:

The sizeof operator yields the size
(in bytes) of its operand

so combined with the quotation above, bytes of 8 bits and char as 4 of those bytes is impossible: for c99 a byte is the same as a char.

In answer to your mention of the possibility of a 7 bit char: this is not possible in c99. According to section 5.2.4.2.1 of the standard the minimum is 8:

Their implementation-defined values shall be equal or greater [my emphasis] in magnitude to those shown, with the same sign.

— number of bits for smallest object that is not a bit-field (byte)

CHAR_BIT 8

— minimum value for an object of type signed char

SCHAR_MIN -127

— maximum value for an object of type signed char

SCHAR_MAX +127

— maximum value for an object of type unsigned char

UCHAR_MAX 255

— minimum value for an object of type char

CHAR_MIN see below

— maximum value for an object of type char

CHAR_MAX see below

[...]

If the value of an object of type char
is treated as a signed integer when
used in an expression, the value of
CHAR_MIN shall be the same as that of
SCHAR_MIN and the value of CHAR_MAX
shall be the same as that of
SCHAR_MAX. Otherwise, the value of
CHAR_MIN shall be 0 and the value of
CHAR_MAX shall be the same as that of
UCHAR_MAX. The value UCHAR_MAX
shall equal 2CHAR_BIT − 1.

Is CHAR_BIT ever 8?

TMS320C28x DSP from Texas Instruments has a byte with 16 bits.

Documentation for the compiler specifies CHAR_BIT as 16 on page 101.

This appears to be a modern processor (currently being sold), compilers supporting C99 and C++03.

Bit size of GLib types and portability to more exotic (think 16 bit char) platforms

As Hans Passant said in his comment, glib guarantees that gint8 is 8-bits by not supporting platforms where signed char is any other size. There are only two types of systems that have ever had C compiler implemenations where this requirement wasn't met.

The first is systems where the byte size is 9-bits. Today these are long obsolete, but systems like these had some of the earliest C compilers. It theory it could be possible for the compiler to emulate a restricted range 8-bit type as an extension, but it would still be 9-bits long in memory, and not really get you anything.

The second are word addressable systems, were the word size is either 16, 32 or 64 bits. In these computers the processor can only address memory at word boundaries. Address 0 is the first word, address 1 is the second word, and so on without any overlap between words. For the most part systems like these are obsolete now, but not anywhere as much as 9-bit byte machines. There's apparently still at least some use of word addressable processors in embedded systems.

In C compilers targeting word addressable systems the size of a byte is either the word size or 8 bits depending on the compiler. Some compilers gave a choice. Having word size bytes is the simple way to go. Implementing 8-bit bytes on other hand requires a fair bit of work. Not only does the compiler have to use multiple instructions to access the separate 8-bit values contained in each word, it also had to emulate a byte addressable pointer. This usually means char pointers have a different size than int pointers, as byte addressable pointers need more room to store both the address and a byte offset.

Needless to say the compilers that use word sized bytes wouldn't be supported by glib, while the ones using 8-bit bytes would at least be able implement gint8. Though they still probably wouldn't be supported for a number of other reasons. The fact that sizeof(char *) > size(int *) is true might be a problem.

I should also point out that there a few other long obsolete systems that, while having C compilers that used an 8-bit byte, still didn't have a type that meets the requirements of gint8. These are the systems that used ones' complement or signed magnitude integers, meaning that signed char ranged from -127 to 127 instead of the -128 to 127 range guaranteed by glib.

System where 1 byte != 8 bit? [duplicate]

On older machines, codes smaller than 8 bits were fairly common, but most of those have been dead and gone for years now.

C and C++ have mandated a minimum of 8 bits for char, at least as far back as the C89 standard. [Edit: For example, C90, §5.2.4.2.1 requires CHAR_BIT >= 8 and UCHAR_MAX >= 255. C89 uses a different section number (I believe that would be §2.2.4.2.1) but identical content]. They treat "char" and "byte" as essentially synonymous [Edit: for example, CHAR_BIT is described as: "number of bits for the smallest object that is not a bitfield (byte)".]

There are, however, current machines (mostly DSPs) where the smallest type is larger than 8 bits -- a minimum of 12, 14, or even 16 bits is fairly common. Windows CE does roughly the same: its smallest type (at least with Microsoft's compiler) is 16 bits. They do not, however, treat a char as 16 bits -- instead they take the (non-conforming) approach of simply not supporting a type named char at all.

Why are chars only 8 bits in size?

char is guaranteed to be 1 byte by C++ standard. Keep in mind that it does not indicate that the size will be 8 bits, since not on every system the statement byte = 8 bits is true. For the sake of explanation, assume that we're talking only about 8 bit bytes.

First of all, when you write:

8 bits = log2N and thus N must equal 256

You are right. 8 bits can represent up to 256 different values, and the fact that Unicode consists of more characters than that has nothing to do with the problem. char is not meant to represent every possible character out there. It is meant to represent one of 256 different values that can be interpreted to some range of printable or non printable characters.

However, on the Unicode table there are far more than 256 characters. And on my compiler, when I run the following lines of code:

char c = static_cast<char> (257);
cout << c;

I see an unknown character printed to the screen, but a character nonetheless.

But have you tried actually determinatig what does static_cast<char>(257) return?

char c = static_cast<char>(257);
std::cout << static_cast<int>(c);

Will print 1, and as we dive into Unicode (or ASCII) table, we can see that this value represents the Start of Heading character. It is a non printable character and printing it will result in an undefined character appearing on the console (need confirmation whether or not this is truly undefined).

For printing a wider range of characters, consider using wchar_t (which is most likely to be 16 bits, thus it can cover a range of 65536 values) and std::wstring to correspond to it.

Is the number of bits in a byte equal to the number of bits in a type char?

Yes. Both are equal to CHAR_BIT*.

C standard defines CHAR_BIT as: "number of bits for smallest object that is not a bit-field (byte)". c99 says explicitly: "A byte contains CHAR_BIT bits."

"UCHAR_MAX shall equal 2CHAR_BIT - 1" — it means unsigned char requires at least CHAR_BIT bits (char_bits >= CHAR_BIT).

sizeof(char) == 1 (single-byte character fits in a byte) i.e., type char requires at most CHAR_BIT bits (char_bits <= CHAR_BIT).

From char_bits >= CHAR_BIT and char_bits <= CHAR_BIT follows that char_bits == CHAR_BIT (no padding bits).

POSIX says it explicitly: "CHAR_BIT Number of bits in a type char."


*: If char is signed and CHAR_BIT > 8 then (without the $6.2.6.2 quote below) it was not clear whether SCHAR_MIN..SCHAR_MAX range covers all CHAR_BIT bits. Though the name CHAR_BIT communicates the intent clearly ("number of bits in char").

c11 says ($6.2.6.2 in n1570 draft): "signed char shall not have any padding bits. There shall be exactly one sign bit."

From $6.2.5.15:

The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char

it follow: All CHAR_BIT bits are used to represent CHAR_MIN..CHAR_MAX range (because both signed and unsigned char type use all bits).

For comparison, unlike char; _Bool may use less bits $6.7.2.1.4(122):

While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.



Related Topics



Leave a reply



Submit