Why Is a Boolean 1 Byte and Not 1 Bit of Size

Why is a boolean 1 byte and not 1 bit of size?

Because the CPU can't address anything smaller than a byte.

What is the size of bool? 1 bit or 1 byte?

For an efficient (packed) representation use std::bitset:

#include <bitset>

std::bitset<2000000> my_bits;

Obviously this is for C++ only. In C you would have to implement this explicitly yourself, e.g.:

#include <stdint.h>
#include <limits.h>

uint8_t my_bits[2000000 / CHAR_BIT];

and then to access individual bits you would need to implement some simple inline functions using bitwise operations.

In C how much space does a bool (boolean) take up? Is it 1 bit, 1 byte or something else?

If you are referring to C99 _Bool try:

printf("%zu\n", sizeof(_Bool)); /* Typically 1. */

Note the standard says:

6.2.5

An object declared as type _Bool is large enough to store the values 0
and 1.

The size cannot be smaller than one byte. But it would be legal to be larger than one byte.

Why must Java booleans be at least 1 byte in size?

There is no reason why a boolean must be one byte in size. In fact, is likely that booleans already aren't 1 byte in (effective) size in some scenarios: when packed next to other elements larger than 1 byte on the stack or in an object, they are likely to be larger (i.e., adding them to an object may cause the size to grow by more than one byte).

Any JVM is free to implement booleans as 1 bit, but as far as know none choose to do so, probably largely because:

  • Accessing one bit is often more expensive than accessing a byte, particularly when writing.

    To read a bit, a CPU using a "classic RISC" instruction set would often need need an additional and instruction to extract the relevant bit out of a packed byte (or larger word) of boolean bits. Some might even need a additional instruction to load a constant to and. In the case of indexing an array of boolean, where the bit-index isn't fixed at compile-time, you'd need a variable shift. Some CPUs such as x86 have an easier time since they have memory sourcetestinstructions, including specific bit-test instructions taking a variable position such asbt`. Such a CPU probably has similar read performance in both representations.

    Writing is worse: rather than a simple byte write to set a boolean value you now need to read the value, modify the appropriate bit and write it back. Some platforms such as x86 have memory source-and-destination RMW instructions such as and and or that will help, but these are still significantly more expensive than plain writes. In the worst case, repeatedly writing the same element will result in a dependency chain through memory that could slow your code down by an order of magnitude (a series of plain stores can't form a dependency chain).

    Even worse, the write method above is totally thread-unsafe. Two threads working on "independent" booleans might clobber each other, so the runtime would have to use atomic update operations just to write a bit for any field where the object cannot be proven local to the thread.

  • The space savings outside of arrays is usually very small, and is often zero: alignment concerns mean that a single bit will often end up taking the same space as a byte on the stack or in the layout for an object. Only if you had many primitive boolean values on the stack or an object would you see a savings (for example, objects are typically aligned to 8-byte boundaries, so if you have an object whose non-boolean fields are int or larger, you'd need at least 4 boolean values to save any space, and often you'd need 8).

  • This leaves the last remaining "big win" for bit-representation boolean in arrays of boolean, where you could have an asymptotic 8x space savings for large arrays. In fact, this case was motivating enough in the C++ world that vector<bool> there has a "special" implementation where each bool takes one bit - a never ending source of headaches due to all the required special cases and non-intuitive behavior (and often used as an example of a mis-feature that can't be removed now).

    If it weren't for the memory model I could imagine a world where Java
    implemented arrays of boolean in a bit-wise manner. They don't have
    the same issues as vector<bool> (mostly because of the extra layer of abtraction provided by the JIT and also because an array provides a simpler interface than vector) and it could be done efficiently, I think. There is that pesky memory model though. That model allows writes to different array elements to be safe if done by different threads (i.e,. they act as independent variables for the purpose of the memory model). All common CPUs support this directly if you implement boolean as a byte, since they have independent byte accesses. No CPUs offer independent bit-access though: you are stuck using atomic operations (x86 offers the lock bt* operations, but these are slow: other platforms have even worse options). That would destroy the performance of any boolean array implemented as a bit-array.

Finally, as described above, implementing boolean as a bit has significant downsides - but what about the upside?

As it turns out, if the user really wants this bit-packed representation of for boolean they can do so themselves! They can pack 8 boolean values into a byte (or 32 values into an int or whatever) in an object (and this is common for flags, etc) and the generated accessor code should be about efficient as efficient as if the JVM natively supported boolean-as-bit. In fact, when you know you want an array-of-bits representation for a large number of booleans, you can simply use BitSet - this has the representation you want and sidesteps the atomic issues by not offering any thread-safety guarantees. So by implementing boolean as a byte, you sidestep all the problems above, but still let the user "opt-in" to bit-level representation if they want, without much runtime penalty.

Why isn't the size of a bool data type only 1 bit in C#?


Is it because the smallest 'addressable' size of a value is a byte

Yep, exactly the same thing. In order for the CLR to be efficient, it maps its data types to the native machine data types in much the same way as the compiler does in C++ (pretty much).

One-byte bool. Why?


Why does a bool require one byte to store true or false where just one bit is enough

Because every object in C++ must be individually addressable* (that is, you must be able to have a pointer to it). You cannot address an individual bit (at least not on conventional hardware).

How much safer is it to use the following?

It's "safe", but it doesn't achieve much.

is the above field technique really going to help?

No, for the same reasons as above ;)

but still compiler generated code to access them is bigger and slower than the code generated to access the primitives.

Yes, this is true. On most platforms, this requires accessing the containing byte (or int or whatever), and then performing bit-shifts and bit-mask operations to access the relevant bit.


If you're really concerned about memory usage, you can use a std::bitset in C++ or a BitSet in Java, which pack bits.


* With a few exceptions.

Is bool guaranteed to be 1 byte?

Rust emits i1 to LLVM for bool and relies on whatever it produces. LLVM uses i8 (one byte) to represent i1 in memory for all the platforms supported by Rust for now. On the other hand, there's no certainty about the future, since the Rust developers have been refusing to commit to the particular bool representation so far.

So, it's guaranteed by the current implementation but not guaranteed by any specifications.

You can find more details in this RFC discussion and the linked PR and issue.

Edit: Please, see the answer below for more information about changes introduced in Rust since this answer had been published.

Why can bool and _Bool only store 0 or 1 if they occupy 1 byte in memory?

The C language limits what can be stored in a _Bool, even if it has the capacity to hold other values besides 0 and 1.

Section 6.3.1.2 of the C standard says the following regarding conversions to _Bool:

When any scalar value is converted to _Bool, the result is 0 if the value compares equal
to 0; otherwise, the result is 1.

The C++17 standard has similar language in section 7.14:

A prvalue of arithmetic, unscoped enumeration, pointer, or pointer to member type can be converted to a
prvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false;
any other value is converted to true. For direct-initialization (11.6), a prvalue of type std::nullptr_t can
be converted to a prvalue of type bool; the resulting value is false.

So even if you attempt to assign some other value to a _Bool the language will convert the value to either 0 or 1 for C and to true or false for C++. If you attempt to bypass this by writing to a _Bool via a pointer to a different type, you invoke undefined behavior.

C++: Is bool a 1-bit variable?

No there's no such thing like a 1 bit variable.

The smallest unit that can be addressed in c++ is a unsigned char.

Is a bool[8] then 1 Byte long?

No.

Or 8 Bytes?

Not necessarily. Depends on the target machines number of bits taken for a unsigned char.


But i don't want to waste space if a bool in C++ is 8 bit long...

You can avoid wasting space when dealing with bits using std::bitset, or boost::dynamic_bitset if you need a dynamic sizing.


As pointed out by @zett42 in their comment you can also address single bits with a bitfield struct (but for reasons of cache alignement this will probably use even more space):

struct S {
// will usually occupy 4 bytes:
unsigned b1 : 1,
b2 : 1,
b3 : 1;
};


Related Topics



Leave a reply



Submit