Why Is Java's Boolean Primitive Size Not Defined

Why is Java's boolean primitive size not defined?

Short answer: yes, boolean values are manipulated as 32-bit entities, but arrays of booleans use 1 byte per element.

Longer answer: the JVM uses a 32-bit stack cell, used to hold local variables, method arguments, and expression values. Primitives that are smaller than 1 cell are padded out, primitives larger than 32 bits (long and double) take 2 cells. This technique minimizes the number of opcodes, but does have some peculiar side-effects (such as the need to mask bytes).

Primitives stored in arrays may use less than 32 bits, and there are different opcodes to load and store primitive values from an array. Boolean and byte values both use the baload and bastore opcodes, which implies that boolean arrays take 1 byte per element.

As far as in-memory object layout goes, this is covered under the "private implementation" rules, it can be 1 bit, 1 byte, or as another poster noted, aligned to a 64-bit double-word boundary. Most likely, it takes the basic word size of the underlying hardware (32 or 64 bits).


As far as minimizing the amount of space that booleans use: it really isn't an issue for most applications. Stack frames (holding local variables and method arguments) aren't very large, and in the big scheme a discrete boolean in an object isn't that large either. If you have lots of objects with lots of booleans, then you can use bit-fields that are managed via your getters and setters. However, you'll pay a penalty in CPU time that is probably bigger than the penalty in memory.

Why must Java booleans be at least 1 byte in size?

There is no reason why a boolean must be one byte in size. In fact, is likely that booleans already aren't 1 byte in (effective) size in some scenarios: when packed next to other elements larger than 1 byte on the stack or in an object, they are likely to be larger (i.e., adding them to an object may cause the size to grow by more than one byte).

Any JVM is free to implement booleans as 1 bit, but as far as know none choose to do so, probably largely because:

  • Accessing one bit is often more expensive than accessing a byte, particularly when writing.

    To read a bit, a CPU using a "classic RISC" instruction set would often need need an additional and instruction to extract the relevant bit out of a packed byte (or larger word) of boolean bits. Some might even need a additional instruction to load a constant to and. In the case of indexing an array of boolean, where the bit-index isn't fixed at compile-time, you'd need a variable shift. Some CPUs such as x86 have an easier time since they have memory sourcetestinstructions, including specific bit-test instructions taking a variable position such asbt`. Such a CPU probably has similar read performance in both representations.

    Writing is worse: rather than a simple byte write to set a boolean value you now need to read the value, modify the appropriate bit and write it back. Some platforms such as x86 have memory source-and-destination RMW instructions such as and and or that will help, but these are still significantly more expensive than plain writes. In the worst case, repeatedly writing the same element will result in a dependency chain through memory that could slow your code down by an order of magnitude (a series of plain stores can't form a dependency chain).

    Even worse, the write method above is totally thread-unsafe. Two threads working on "independent" booleans might clobber each other, so the runtime would have to use atomic update operations just to write a bit for any field where the object cannot be proven local to the thread.

  • The space savings outside of arrays is usually very small, and is often zero: alignment concerns mean that a single bit will often end up taking the same space as a byte on the stack or in the layout for an object. Only if you had many primitive boolean values on the stack or an object would you see a savings (for example, objects are typically aligned to 8-byte boundaries, so if you have an object whose non-boolean fields are int or larger, you'd need at least 4 boolean values to save any space, and often you'd need 8).

  • This leaves the last remaining "big win" for bit-representation boolean in arrays of boolean, where you could have an asymptotic 8x space savings for large arrays. In fact, this case was motivating enough in the C++ world that vector<bool> there has a "special" implementation where each bool takes one bit - a never ending source of headaches due to all the required special cases and non-intuitive behavior (and often used as an example of a mis-feature that can't be removed now).

    If it weren't for the memory model I could imagine a world where Java
    implemented arrays of boolean in a bit-wise manner. They don't have
    the same issues as vector<bool> (mostly because of the extra layer of abtraction provided by the JIT and also because an array provides a simpler interface than vector) and it could be done efficiently, I think. There is that pesky memory model though. That model allows writes to different array elements to be safe if done by different threads (i.e,. they act as independent variables for the purpose of the memory model). All common CPUs support this directly if you implement boolean as a byte, since they have independent byte accesses. No CPUs offer independent bit-access though: you are stuck using atomic operations (x86 offers the lock bt* operations, but these are slow: other platforms have even worse options). That would destroy the performance of any boolean array implemented as a bit-array.

Finally, as described above, implementing boolean as a bit has significant downsides - but what about the upside?

As it turns out, if the user really wants this bit-packed representation of for boolean they can do so themselves! They can pack 8 boolean values into a byte (or 32 values into an int or whatever) in an object (and this is common for flags, etc) and the generated accessor code should be about efficient as efficient as if the JVM natively supported boolean-as-bit. In fact, when you know you want an array-of-bits representation for a large number of booleans, you can simply use BitSet - this has the representation you want and sidesteps the atomic issues by not offering any thread-safety guarantees. So by implementing boolean as a byte, you sidestep all the problems above, but still let the user "opt-in" to bit-level representation if they want, without much runtime penalty.

Why does the boolean data type need 8 bits?

I think it may need more than 8 bits. It depends on JMV." In Oracle JVM primitive boolean needs 8 bits, the reason is limited support and lack of optimization.

Read also: What is the size of a boolean variable in Java?

After The Java Tutorials - Primitive Data Types

boolean: The boolean data type has only two possible values: true and false. Use this data type for simple flags that track true/false conditions. This data type represents one bit of information, but its "size" isn't something that's precisely defined.

After The Java® Virtual Machine Specification

Although the Java Virtual Machine defines a boolean type, it only provides
very limited support for it
. There are no Java Virtual Machine instructions solely
dedicated to operations on boolean values. Instead, expressions in the Java
programming language that operate on boolean values are compiled to use values
of the Java Virtual Machine int data type.

In Oracle’s Java Virtual Machine implementation, boolean arrays in the Java
programming language are encoded as Java Virtual Machine byte arrays, using 8 bits per
boolean element
.

For example Boolean type looks in memory like this

header:   8 bytes 
value: 1 byte
padding: 7 bytes
------------------
sum: 16 bytes

As an alternative to boolean[] you can use for example java.util.BitSet.

Why is hard to store booleans as 1 bit? Read Vlad from Moscow answer. You cant address one bit of memory.

What is the size of a boolean variable in Java?

It's virtual machine dependent.

Is byte more efficient than boolean[8]

Read the topic "How much memory does a boolean consume?". They suggest BitSet as solution for large sets of booleans, but in your case byte solves the problem better, because you won't have a large set of booleans, you will have a large set of 8 booleans objects.

Summarizing: byte is better than 8 booleans.

Why for loop can hold returned boolean value but not primitive boolean value?

Just to expand a little on Jahan's answer:

ForInit and ForUpdate have to be StatementExpressions.

A StatementExpression is an expression that can be made into a statement (specifically, imaginatively, StatementExpression ; is called an ExpressionStatement).

Not all expressions are StatementExpressions: for example, you can't write this:

boolean b = true;
b; // b is an expression, not a StatementExpression

This is sort of like shouting a word, "true!" - it has no meaning in and of itself: you have to surround it with some context, like "it is true that Java has 4 letters".

Similarly, 4 + 3; isn't allowed, because it doesn't "do" anything: 7 just... is.

However, you can write this:

someMethod(b); // MethodInvocation is a StatementExpression

So, you can use someMethod(b) in the ForInit and ForUpdate, but you can't use b.

You can find all of the different kinds of StatementExpression in the language spec (examples are mine):

StatementExpression:
Assignment // b = true
PreIncrementExpression // ++i
PreDecrementExpression // --i
PostIncrementExpression // i++
PostDecrementExpression // i--
MethodInvocation // someMethod(b)
ClassInstanceCreationExpression // new Object()

Why is boolean the only type for which there are no coercions or casts defined?

The internal representation isn't relevant.

Java is strongly-typed: an int isn't a boolean, and a boolean isn't an int.

This was a design decision--if you want to use true/false, use booleans; that communicates something.

If you want to use an integer to represent true/false, use mathematical comparisons--that's fine too.

The language doesn't support direct conversion because it doesn't, by design.


Regarding the JVM spec and the internal representation of booleans, this sentence is interesting:

Where Java programming language boolean values are mapped by compilers to values of Java virtual machine type int, the compilers must use the same encoding.

To me, that reads:

If the compiler/VM is using a VM integer to represent true/false, then it must use the 1/0 convention. With the implication being: If the compiler/VM isn't using a VM integer for true/false, this isn't a requirement.

Clarification on this point would be interesting, but ultimately not related, really.

Why not implement BitSet with a more size-deterministic type?

I think that you're getting conceptually stuck by the fact that the method signatures use booleans.

The easiest way to think about a single bit is off/on, so a boolean true/false is a convenient way to model it. Another thing entirely is the BitSet internal storage, which if you have a look at the source code, is using a long array and using bitmasks to twiddle individual bits.

Accordingly, the size of the BitSet is tied pretty closely to the number of bits in use.

boolean Vs. byte

From Primitive Data Types

The boolean data type has only two possible values: true and false. Use this data type for simple flags that track true/false conditions. This data type represents one bit of information, but its "size" isn't something that's precisely defined.

However, in the Oracle JVM it uses 1 byte per bit so the memory size and efficiency is the same.

If you would like to use 1 bit per bit, I suggest using a BitSet.

Is it possible to somehow treat a long value as a boolean array?

yes, although I can't image why you would want to.



Related Topics



Leave a reply



Submit