System where 1 byte != 8 bit?
On older machines, codes smaller than 8 bits were fairly common, but most of those have been dead and gone for years now.
C and C++ have mandated a minimum of 8 bits for char
, at least as far back as the C89 standard. [Edit: For example, C90, §5.2.4.2.1 requires CHAR_BIT
>= 8 and UCHAR_MAX
>= 255. C89 uses a different section number (I believe that would be §2.2.4.2.1) but identical content]. They treat "char" and "byte" as essentially synonymous [Edit: for example, CHAR_BIT
is described as: "number of bits for the smallest object that is not a bitfield (byte)".]
There are, however, current machines (mostly DSPs) where the smallest type is larger than 8 bits -- a minimum of 12, 14, or even 16 bits is fairly common. Windows CE does roughly the same: its smallest type (at least with Microsoft's compiler) is 16 bits. They do not, however, treat a char
as 16 bits -- instead they take the (non-conforming) approach of simply not supporting a type named char
at all.
Significance of Bytes as 8 bits
A byte is not necessarily 8 bits. A byte a unit of digital information whose size is processor-dependent. Historically, the size of a byte is equal to the size of a character as specified by the character encoding supported by the processor. For example, a processor that supports Binary-Coded Decimal (BCD) characters defines a byte to be 4 bits. A processor that supports ASCII defines a byte to be 7 bits. The reason for using the character size to define the size of a byte is to make programming easier, considering that a byte has always (as far as I know) been used as the smallest addressable unit of data storage. If you think about it, you'll find that this is indeed very convenient.
A byte is defined to be 8 bits in the extremely successful IBM S/360 computer family, which used an 8-bit character encoding called EBCDI. IBM, through its S/360 computers, introduced several crucially important computing techniques that became the foundation of all future processors including the ones we using today. In fact, the term byte has been coined by Buchholz, a computer scientist at IBM.
When Intel introduced its first 8-bit processor (8008), a byte was defined to be 8 bits even though the instruction set didn't support directly any character encoding, thereby breaking the pattern. The processor, however, provided numerous instructions that operate on packed (4-bit) and unpacked (8-bit) BCD-encoded digits. In fact, the whole x86 instruction set design was conveniently designed based on 8-bit bytes. The fact that 7-bit ASCII characters fit in 8-bit bytes was a free, additional advantage. As usual, a byte is the smallest addressable unit of storage. I would like to mention here that in digital circuit design, its convenient to have the number of wires or pins to be powers of 2 so that every possible value that appear as input or output has a use.
Later processors continued to use 8-bit bytes because it makes it much easier to develop newer designs based on older ones. It also helps making newer processors compatible with older ones. Therefore, instead of changing the size of a byte, the register, data bus, address bus sizes were doubled every time (now we reached 64-bit). This doubling enabled us to use existing digital circuit designs easily, significantly reducing processor design costs.
1 byte is equal to 8 bits. What is the logic behind this?
I'ts been a minute since I took computer organization, but the relevant wiki on 'Byte' gives some context.
The byte was originally the smallest number of bits that could hold a single character (I assume standard ASCII). We still use ASCII standard, so 8 bits per character is still relevant. This sentence, for instance, is 41 bytes. That's easily countable and practical for our purposes.
If we had only 4 bits, there would only be 16 (2^4) possible characters, unless we used 2 bytes to represent a single character, which is more inefficient computationally. If we had 16 bits in a byte, we would have a whole lot more 'dead space' in our instruction set, we would allow 65,536 (2^16) possible characters, which would make computers run less efficiently when performing byte-level instructions, especially since our character set is much smaller.
Additionally, a byte can represent 2 nibbles. Each nibble is 4 bits, which is the smallest number of bits that can encode any numeric digit from 0 to 9 (10 different digits).
Read char from file on platform where bytes 8 bits
Most programs are probably not portable
Reading through the comments here, I think the answer is that this program is not fully portable as per the C++ standard. No program is. edit: At least no program that talks to a network, reads or writes to disk or expects other platforms to answer or update data.
C++ sides with the platform's hardware and is not Java. fgetc
and fputc
for example, are round trip value preserving, but only on the same platform. Network messages work because everyone assumes 8 bits to a byte.
If there is concern then it would be best to assert that the platform has 8 bits to a byte: static_assert(CHAR_BIT==8, "Platform must have 8 bits to a byte.");
Even without the assert there will be other alarm bells. A platform that does not have 8 bits to a byte, but still talks to other platforms via networking or files will fail earlier than later and porting code to it will require the extra work to read and write data with the assumed 8 bit de facto standard. This seems much like the endianness issue, but the difference here is that one side has very clearly won.
But they could be made portable
Edit: the statement above might not always hold. With appropriate effort the program could be made portable. The following adaptation from Mooing Duck demonstrates how this program and the iterator might consider how to behave with a different number of bits. It shows how a system with more bits might read a file from a system with fewer bits. This could be expanded to work both ways:
#include <iterator>
#include <climits>
#include <iostream>
template<class base_iterator, size_t source_bits>
class bititerator : public std::iterator<std::input_iterator_tag, unsigned char> {
mutable base_iterator base;
mutable unsigned char bufferhi;
mutable unsigned char bufferlo;
mutable unsigned char bitc;
public:
bititerator(const base_iterator& b) : base(b), bufferhi(0), bufferlo(0), bitc(0) {}
bititerator& operator=(const bititerator&b) {base = b.base; bufferlo=b.bufferlo; bufferhi=b.bufferhi; bitc=b.bitc; return *this;}
friend void swap(bititerator&lhs, bititerator&rhs) {std::swap(lhs.base, rhs.base); std::swap(lhs.bufferlo, rhs.bufferlo); std::swap(lhs.bufferhi, rhs.bufferhi); std::swap(lhs.bit, rhs.bitc);}
bititerator operator++(int) {bititerator t(*this); ++*this; return t;}
unsigned char* operator->() const {operator*(); return &bufferlo;}
friend bool operator==(const bititerator&lhs, const bititerator&rhs) {return lhs.base==rhs.base && lhs.bitc==rhs.bitc;}
friend bool operator!=(const bititerator&lhs, const bititerator&rhs) {return !(lhs==rhs);}
unsigned char operator*() const {
static_assert(source_bits<CHAR_BIT, "bititerator only works on systems with more bits than the target");
//make sure at least source_bits bits are in the buffers
if (bitc < source_bits) {
bufferhi = static_cast<unsigned char>(*base);
++base;
size_t shift = source_bits-bitc;
bufferlo |= ((bufferhi<<shift)&0xFF);
bufferhi >>= shift;
bitc += CHAR_BIT;
}
return bufferlo;
}
bititerator& operator++() {
operator*();
//shift the buffers down source_bits bits
bufferlo >>= source_bits;
bufferlo |= ((bufferhi<<(CHAR_BIT-source_bits))&0xFF);;
bufferhi >>= source_bits;
bitc -= source_bits;
return *this;
}
};
template<class base_iterator>
bititerator<base_iterator,6> from6bit(base_iterator it) {return bititerator<base_iterator,6>(it);}
bititerator<std::istreambuf_iterator<char>,6> from6bitStart(std::istream& str) {return bititerator<std::istreambuf_iterator<char>,6>{std::istreambuf_iterator<char>{str}};}
bititerator<std::istreambuf_iterator<char>,6> from6bitEnd(std::istream& str) {return bititerator<std::istreambuf_iterator<char>,6>{std::istreambuf_iterator<char>{}};}
#include <fstream>
int main()
{
std::fstream file("binaryfile");
auto end = from6bitEnd(file);
for (auto iter = from6bitStart(file); iter != end; ++iter)
std::cout << *iter;
}
Will a `char` always-always-always have 8 bits?
Yes,
char
andbyte
are pretty much the same. A byte is the smallest addressable amount of memory, and so is achar
in C.char
always has size 1.From the spec, section 3.6 byte:
byte
addressable unit of data storage large enough to hold any member of the basic character set of the execution environment
And section 3.7.1 character:
character
single-byte character
<C> bit representation that fits in a byteA
char
hasCHAR_BIT
bits. It could be any number (well, 8 or greater according to the spec), but is definitely most often 8. There are real machines with 16- and 32-bitchar
types, though.CHAR_BIT
is defined inlimits.h
.From the spec, section 5.2.4.2.1 Sizes of integer types
<limits.h>
:The values given below shall be replaced by constant expressions suitable for use in
#if
preprocessing directives. Moreover, except forCHAR_BIT
andMB_LEN_MAX
, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT
8
sizeof(char) == 1
. Always.From the spec, section 6.5.3.4 The
sizeof
operator, paragraph 3:When applied to an operand that has type
char
,unsigned char
, orsigned char
, (or a qualified version thereof) the result is 1.You can allocate as much memory as your system will let you allocate - there's nothing in the standard that defines how much that might be. You could imagine, for example, a computer with a cloud-storage backed memory allocation system - your allocatable memory might be practically infinite.
Here's the complete spec section 7.20.3.3 The
malloc
function:Synopsis
1
#include <stdlib.h>
void *malloc(size_t size);
Description
2 The
malloc
function allocates space for an object whose size is specified bysize
and whose value is indeterminate.Returns
3 The
malloc
function returns either a null pointer or a pointer to the allocated space.That's the entirety of the specification, so there's not really any limit you can rely on.
Can stdint's int8_t exist on an architecture that does not have 8-bit bytes?
EDIT: As @JasonD points out in the comments below, the linked page states at the end;
As a consequence of adding int8_t, the following are true:
A byte is exactly 8 bits.
{CHAR_BIT} has the value 8, {SCHAR_MAX} has the value 127, {SCHAR_MIN} has the value -128, and {UCHAR_MAX} has the value 255.
In other words, the linked IEEE page does not apply to architectures with other byte lengths than 8. This is in line with POSIX which requires 8 bit char.
-- Before edit --
The explanation is in a note on the page you linked to;
The "width" of an integer type is the number of bits used to store its value in a pure binary system; the actual type may use more bits than that (for example, a 28-bit type could be stored in 32 bits of actual storage)
Why is the size of one byte (in bits) implementation-defined and with that variable? I´ve thought a byte is always consistent of 8 bits
I thought the size of a byte is absolute fixed in the technology of information to be comprised of exactly 8 bits.
No. That is the evolution of "byte" which is now so commonly 8.
Other values of 9, 16, 18, 32, 64 .... have occurred for various technical (and business) reason. Their rarity today does make it surprising that CHAR_BIT
could be anything but 8.
Recall C is very portable and accommodates a wide range of architectures.
If the byte size isn´t fixed, how we can talk about f.e. a char to be comprised of 8 bits and an int of 32 bits (4 Bytes), assuming 64-bit systems?
In C, you cannot - in general. A char
might by 9 bits or 64, etc. Such systems are rare these days.
Avoid "assuming 64-bit systems" as a requirement to drive int
size to some width. More factors apply.
Are there machines, where sizeof(char) != 1, or at least CHAR_BIT 8?
It is always one in C99, section 6.5.3.4:
When applied to an operand that has
typechar
,unsigned char
, orsigned char
, (or a qualified version thereof)
the result is1
.
Edit: not part of your question, but for interest from Harbison and Steele's. C: A Reference Manual, Third Edition, Prentice Hall, 1991 (pre c99) p. 148:
A storage unit is taken to be the
amount of storage occupied by one
character; the size of an object of
typechar
is therefore1
.
Edit: In answer to your updated question, the following question and answer from Harbison and Steele is relevant (ibid, Ex. 4 of Ch. 6):
Is it allowable to have a C
implementation in which typechar
can
represent values ranging from
-2,147,483,648 through 2,147,483,647? If so, what would besizeof(char)
under that implementation? What would
be the smallest and largest ranges of
typeint
?
Answer (ibid, p. 382):
It is permitted (if wasteful) for an
implementation to use 32 bits to
represent typechar
. Regardless of
the implementation, the value ofsizeof(char)
is always 1.
While this does not specifically address a case where, say bytes are 8 bits and char
are 4 of those bytes (actually impossible with the c99 definition, see below), the fact that sizeof(char) = 1
always is clear from the c99 standard and Harbison and Steele.
Edit: In fact (this is in response to your upd 2 question), as far as c99 is concerned sizeof(char)
is in bytes, from section 6.5.3.4 again:
The
sizeof
operator yields the size
(in bytes) of its operand
so combined with the quotation above, bytes of 8 bits and char
as 4 of those bytes is impossible: for c99 a byte is the same as a char
.
In answer to your mention of the possibility of a 7 bit char
: this is not possible in c99. According to section 5.2.4.2.1 of the standard the minimum is 8:
Their implementation-defined values shall be equal or greater [my emphasis] in magnitude to those shown, with the same sign.
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
— minimum value for an object of type
signed char
SCHAR_MIN -127
— maximum value for an object of type
signed char
SCHAR_MAX +127
— maximum value for an object of type
unsigned char
UCHAR_MAX 255
— minimum value for an object of type
char
CHAR_MIN
see below— maximum value for an object of type
char
CHAR_MAX
see below[...]
If the value of an object of type
char
is treated as a signed integer when
used in an expression, the value ofCHAR_MIN
shall be the same as that ofSCHAR_MIN
and the value ofCHAR_MAX
shall be the same as that ofSCHAR_MAX
. Otherwise, the value ofCHAR_MIN
shall be0
and the value ofCHAR_MAX
shall be the same as that ofUCHAR_MAX
. The valueUCHAR_MAX
shall equal 2CHAR_BIT − 1.
Related Topics
Rules For C++ String Literals Escape Character
How to Write a Large Buffer into a Binary File in C++, Fast
C++ Standard Library: How to Write Wrappers For Cout, Cerr, Cin and Endl
How to Call Through a Member Function Pointer
What Exactly Is "Broken" With Microsoft Visual C++'S Two-Phase Template Instantiation
A Confusing Detail About the Most Vexing Parse
How to Set a Breakpoint on 'Memory Access' in Gdb
Rotating a Point About Another Point (2D)
Can Main Function Call Itself in C++
Return a 2D Array from a Function
Declaring Variables Inside Loops, Good Practice or Bad Practice
Why Does Printf Not Print Out Just One Byte When Printing Hex
Vs Code Will Not Build C++ Programs With Multiple .Ccp Source Files
"Undefined Reference To" Errors When Linking Static C Library With C++ Code