System Where 1 Byte != 8 Bit

System where 1 byte != 8 bit?

On older machines, codes smaller than 8 bits were fairly common, but most of those have been dead and gone for years now.

C and C++ have mandated a minimum of 8 bits for char, at least as far back as the C89 standard. [Edit: For example, C90, §5.2.4.2.1 requires CHAR_BIT >= 8 and UCHAR_MAX >= 255. C89 uses a different section number (I believe that would be §2.2.4.2.1) but identical content]. They treat "char" and "byte" as essentially synonymous [Edit: for example, CHAR_BIT is described as: "number of bits for the smallest object that is not a bitfield (byte)".]

There are, however, current machines (mostly DSPs) where the smallest type is larger than 8 bits -- a minimum of 12, 14, or even 16 bits is fairly common. Windows CE does roughly the same: its smallest type (at least with Microsoft's compiler) is 16 bits. They do not, however, treat a char as 16 bits -- instead they take the (non-conforming) approach of simply not supporting a type named char at all.

Significance of Bytes as 8 bits

A byte is not necessarily 8 bits. A byte a unit of digital information whose size is processor-dependent. Historically, the size of a byte is equal to the size of a character as specified by the character encoding supported by the processor. For example, a processor that supports Binary-Coded Decimal (BCD) characters defines a byte to be 4 bits. A processor that supports ASCII defines a byte to be 7 bits. The reason for using the character size to define the size of a byte is to make programming easier, considering that a byte has always (as far as I know) been used as the smallest addressable unit of data storage. If you think about it, you'll find that this is indeed very convenient.

A byte is defined to be 8 bits in the extremely successful IBM S/360 computer family, which used an 8-bit character encoding called EBCDI. IBM, through its S/360 computers, introduced several crucially important computing techniques that became the foundation of all future processors including the ones we using today. In fact, the term byte has been coined by Buchholz, a computer scientist at IBM.

When Intel introduced its first 8-bit processor (8008), a byte was defined to be 8 bits even though the instruction set didn't support directly any character encoding, thereby breaking the pattern. The processor, however, provided numerous instructions that operate on packed (4-bit) and unpacked (8-bit) BCD-encoded digits. In fact, the whole x86 instruction set design was conveniently designed based on 8-bit bytes. The fact that 7-bit ASCII characters fit in 8-bit bytes was a free, additional advantage. As usual, a byte is the smallest addressable unit of storage. I would like to mention here that in digital circuit design, its convenient to have the number of wires or pins to be powers of 2 so that every possible value that appear as input or output has a use.

Later processors continued to use 8-bit bytes because it makes it much easier to develop newer designs based on older ones. It also helps making newer processors compatible with older ones. Therefore, instead of changing the size of a byte, the register, data bus, address bus sizes were doubled every time (now we reached 64-bit). This doubling enabled us to use existing digital circuit designs easily, significantly reducing processor design costs.

1 byte is equal to 8 bits. What is the logic behind this?

I'ts been a minute since I took computer organization, but the relevant wiki on 'Byte' gives some context.

The byte was originally the smallest number of bits that could hold a single character (I assume standard ASCII). We still use ASCII standard, so 8 bits per character is still relevant. This sentence, for instance, is 41 bytes. That's easily countable and practical for our purposes.

If we had only 4 bits, there would only be 16 (2^4) possible characters, unless we used 2 bytes to represent a single character, which is more inefficient computationally. If we had 16 bits in a byte, we would have a whole lot more 'dead space' in our instruction set, we would allow 65,536 (2^16) possible characters, which would make computers run less efficiently when performing byte-level instructions, especially since our character set is much smaller.

Additionally, a byte can represent 2 nibbles. Each nibble is 4 bits, which is the smallest number of bits that can encode any numeric digit from 0 to 9 (10 different digits).

Read char from file on platform where bytes 8 bits

Most programs are probably not portable

Reading through the comments here, I think the answer is that this program is not fully portable as per the C++ standard. No program is. edit: At least no program that talks to a network, reads or writes to disk or expects other platforms to answer or update data.

C++ sides with the platform's hardware and is not Java. fgetc and fputc for example, are round trip value preserving, but only on the same platform. Network messages work because everyone assumes 8 bits to a byte.

If there is concern then it would be best to assert that the platform has 8 bits to a byte: static_assert(CHAR_BIT==8, "Platform must have 8 bits to a byte.");

Even without the assert there will be other alarm bells. A platform that does not have 8 bits to a byte, but still talks to other platforms via networking or files will fail earlier than later and porting code to it will require the extra work to read and write data with the assumed 8 bit de facto standard. This seems much like the endianness issue, but the difference here is that one side has very clearly won.

But they could be made portable

Edit: the statement above might not always hold. With appropriate effort the program could be made portable. The following adaptation from Mooing Duck demonstrates how this program and the iterator might consider how to behave with a different number of bits. It shows how a system with more bits might read a file from a system with fewer bits. This could be expanded to work both ways:

#include <iterator>
#include <climits>
#include <iostream>

template<class base_iterator, size_t source_bits>
class bititerator : public std::iterator<std::input_iterator_tag, unsigned char> {
mutable base_iterator base;
mutable unsigned char bufferhi;
mutable unsigned char bufferlo;
mutable unsigned char bitc;
public:
bititerator(const base_iterator& b) : base(b), bufferhi(0), bufferlo(0), bitc(0) {}
bititerator& operator=(const bititerator&b) {base = b.base; bufferlo=b.bufferlo; bufferhi=b.bufferhi; bitc=b.bitc; return *this;}
friend void swap(bititerator&lhs, bititerator&rhs) {std::swap(lhs.base, rhs.base); std::swap(lhs.bufferlo, rhs.bufferlo); std::swap(lhs.bufferhi, rhs.bufferhi); std::swap(lhs.bit, rhs.bitc);}
bititerator operator++(int) {bititerator t(*this); ++*this; return t;}
unsigned char* operator->() const {operator*(); return &bufferlo;}
friend bool operator==(const bititerator&lhs, const bititerator&rhs) {return lhs.base==rhs.base && lhs.bitc==rhs.bitc;}
friend bool operator!=(const bititerator&lhs, const bititerator&rhs) {return !(lhs==rhs);}
unsigned char operator*() const {
static_assert(source_bits<CHAR_BIT, "bititerator only works on systems with more bits than the target");
//make sure at least source_bits bits are in the buffers
if (bitc < source_bits) {
bufferhi = static_cast<unsigned char>(*base);
++base;
size_t shift = source_bits-bitc;
bufferlo |= ((bufferhi<<shift)&0xFF);
bufferhi >>= shift;
bitc += CHAR_BIT;
}
return bufferlo;

}
bititerator& operator++() {
operator*();
//shift the buffers down source_bits bits
bufferlo >>= source_bits;
bufferlo |= ((bufferhi<<(CHAR_BIT-source_bits))&0xFF);;
bufferhi >>= source_bits;
bitc -= source_bits;
return *this;
}
};

template<class base_iterator>
bititerator<base_iterator,6> from6bit(base_iterator it) {return bititerator<base_iterator,6>(it);}
bititerator<std::istreambuf_iterator<char>,6> from6bitStart(std::istream& str) {return bititerator<std::istreambuf_iterator<char>,6>{std::istreambuf_iterator<char>{str}};}
bititerator<std::istreambuf_iterator<char>,6> from6bitEnd(std::istream& str) {return bititerator<std::istreambuf_iterator<char>,6>{std::istreambuf_iterator<char>{}};}

#include <fstream>
int main()
{
std::fstream file("binaryfile");
auto end = from6bitEnd(file);
for (auto iter = from6bitStart(file); iter != end; ++iter)
std::cout << *iter;
}

Will a `char` always-always-always have 8 bits?

  1. Yes, char and byte are pretty much the same. A byte is the smallest addressable amount of memory, and so is a char in C. char always has size 1.

    From the spec, section 3.6 byte:

    byte

    addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

    And section 3.7.1 character:

    character

    single-byte character

    <C> bit representation that fits in a byte

  2. A char has CHAR_BIT bits. It could be any number (well, 8 or greater according to the spec), but is definitely most often 8. There are real machines with 16- and 32-bit char types, though. CHAR_BIT is defined in limits.h.

    From the spec, section 5.2.4.2.1 Sizes of integer types <limits.h>:

    The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

    — number of bits for smallest object that is not a bit-field (byte)

        CHAR_BIT                               8

  3. sizeof(char) == 1. Always.

    From the spec, section 6.5.3.4 The sizeof operator, paragraph 3:

    When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

  4. You can allocate as much memory as your system will let you allocate - there's nothing in the standard that defines how much that might be. You could imagine, for example, a computer with a cloud-storage backed memory allocation system - your allocatable memory might be practically infinite.

    Here's the complete spec section 7.20.3.3 The malloc function:

    Synopsis

    1 #include <stdlib.h>

       void *malloc(size_t size);

    Description

    2 The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

    Returns

    3 The malloc function returns either a null pointer or a pointer to the allocated space.

    That's the entirety of the specification, so there's not really any limit you can rely on.

Can stdint's int8_t exist on an architecture that does not have 8-bit bytes?

EDIT: As @JasonD points out in the comments below, the linked page states at the end;

As a consequence of adding int8_t, the following are true:

  • A byte is exactly 8 bits.

  • {CHAR_BIT} has the value 8, {SCHAR_MAX} has the value 127, {SCHAR_MIN} has the value -128, and {UCHAR_MAX} has the value 255.

In other words, the linked IEEE page does not apply to architectures with other byte lengths than 8. This is in line with POSIX which requires 8 bit char.

-- Before edit --


The explanation is in a note on the page you linked to;

The "width" of an integer type is the number of bits used to store its value in a pure binary system; the actual type may use more bits than that (for example, a 28-bit type could be stored in 32 bits of actual storage)

Why is the size of one byte (in bits) implementation-defined and with that variable? I´ve thought a byte is always consistent of 8 bits

I thought the size of a byte is absolute fixed in the technology of information to be comprised of exactly 8 bits.

No. That is the evolution of "byte" which is now so commonly 8.

Other values of 9, 16, 18, 32, 64 .... have occurred for various technical (and business) reason. Their rarity today does make it surprising that CHAR_BIT could be anything but 8.

Recall C is very portable and accommodates a wide range of architectures.


If the byte size isn´t fixed, how we can talk about f.e. a char to be comprised of 8 bits and an int of 32 bits (4 Bytes), assuming 64-bit systems?

In C, you cannot - in general. A char might by 9 bits or 64, etc. Such systems are rare these days.

Avoid "assuming 64-bit systems" as a requirement to drive int size to some width. More factors apply.

Are there machines, where sizeof(char) != 1, or at least CHAR_BIT 8?

It is always one in C99, section 6.5.3.4:

When applied to an operand that has
type char, unsigned char, or signed char, (or a qualified version thereof)
the result is 1.

Edit: not part of your question, but for interest from Harbison and Steele's. C: A Reference Manual, Third Edition, Prentice Hall, 1991 (pre c99) p. 148:

A storage unit is taken to be the
amount of storage occupied by one
character; the size of an object of
type char is therefore 1.

Edit: In answer to your updated question, the following question and answer from Harbison and Steele is relevant (ibid, Ex. 4 of Ch. 6):

Is it allowable to have a C
implementation in which type char can
represent values ranging from
-2,147,483,648 through 2,147,483,647? If so, what would be sizeof(char)
under that implementation? What would
be the smallest and largest ranges of
type int?

Answer (ibid, p. 382):

It is permitted (if wasteful) for an
implementation to use 32 bits to
represent type char. Regardless of
the implementation, the value of
sizeof(char) is always 1.

While this does not specifically address a case where, say bytes are 8 bits and char are 4 of those bytes (actually impossible with the c99 definition, see below), the fact that sizeof(char) = 1 always is clear from the c99 standard and Harbison and Steele.

Edit: In fact (this is in response to your upd 2 question), as far as c99 is concerned sizeof(char) is in bytes, from section 6.5.3.4 again:

The sizeof operator yields the size
(in bytes) of its operand

so combined with the quotation above, bytes of 8 bits and char as 4 of those bytes is impossible: for c99 a byte is the same as a char.

In answer to your mention of the possibility of a 7 bit char: this is not possible in c99. According to section 5.2.4.2.1 of the standard the minimum is 8:

Their implementation-defined values shall be equal or greater [my emphasis] in magnitude to those shown, with the same sign.

— number of bits for smallest object that is not a bit-field (byte)

CHAR_BIT 8

— minimum value for an object of type signed char

SCHAR_MIN -127

— maximum value for an object of type signed char

SCHAR_MAX +127

— maximum value for an object of type unsigned char

UCHAR_MAX 255

— minimum value for an object of type char

CHAR_MIN see below

— maximum value for an object of type char

CHAR_MAX see below

[...]

If the value of an object of type char
is treated as a signed integer when
used in an expression, the value of
CHAR_MIN shall be the same as that of
SCHAR_MIN and the value of CHAR_MAX
shall be the same as that of
SCHAR_MAX. Otherwise, the value of
CHAR_MIN shall be 0 and the value of
CHAR_MAX shall be the same as that of
UCHAR_MAX. The value UCHAR_MAX
shall equal 2CHAR_BIT − 1.



Related Topics



Leave a reply



Submit