Java's Virtual MAChine's Endianness

Java's Virtual Machine's Endianness

Multibyte data in the class files are stored big-endian.

From The Java Virtual Machine Specification, Java SE 7 Edition, Chapter 4: The class File Format:

A class file consists of a stream of
8-bit bytes. All 16-bit, 32-bit, and
64-bit quantities are constructed by
reading in two, four, and eight
consecutive 8-bit bytes, respectively.
Multibyte data items are always stored
in big-endian order, where the high
bytes come first.

Furthermore, the operand in an bytecode instruction is also big-endian if it spans multiple bytes.

From The Java Virtual Machine Specification, Java SE 7 Edition, Section 2.11: Instruction Set Summary:

If an operand is more than one byte in
size, then it is stored in big-endian
order-high-order byte first. For
example, an unsigned 16-bit index into
the local variables is stored as two
unsigned bytes, byte1 and byte2, such
that its value is (byte1 << 8) | byte2.

So yes, I think it can be said that the Java Virtual Machine uses big-endian.

find endianness of system in java

I take no credit for this, however you can try:

import java.nio.ByteOrder;

if (ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN)) {
System.out.println("Big-endian");
} else {
System.out.println("Little-endian");
}

Java's Virtual Machine's Endianness (suite)

When exchanging binary data, endian-ness must always be part of the specification. If it is not, you have an endianness problem. Unless some software provides guarantees like "serialized objects can be de-serialized on a different machine with a different architecture."
But in this case, endian problems are addressed in the (de)serialization level.

How does Java Handle Endianess when running on Little Endian CPU Architectures?

Java being Big Endian how does it handle Little Endian CPUs while maintaining performance?

Java is not Big Endian. In the few places in the Java Runtime Library where Endianness is even an issue, the API uses Big Endian, but it is always well-documented, and some of the APIs allow you to specify the Endianness you want.

Does JVM (OpenJDK, OpenJ9, etc.) do any special optimisations to maintain performance like only selectively being Big Endian in special situation in Little Endian platforms?

No, the JVM uses the native Endianness.

Is there special endianess handling when accessing ByteBuffers or calling native code or writing to IO or accessing volatile variables?

Yes, No, Yes, and No.

Since the JVM uses native byte order, there is no handling needed for calling native code or accessing volatile variables. Byte order only matters when (de)serializing to/from bytes, e.g. when accessing ByteBuffers or writing to IO.

How does Java change the endianess in Little Endian architectures?

Same way you would change Endianness anywhere, it swaps the bytes, or read/writes the bytes in the appropriate order.

At what point or operation (load, store, calculation, registers, cache, memory, etc.) is the endianess changed?

It's not, since the JVM uses the native Endianness. Endianness is only applied when the native value is converted to/from bytes. At no other point in time does Endianness matter.

What kind of performance penalty would this have?

None, since it doesn't do anything.

Where runtime Endianness is defined in the Java Language Specification?

This is not so much a matter of the language, but rather of the virtual machine - that's why it is defined in the Java Virtual Machine Specifiction, but not in the Java Language Specification.

In fact, the results of these bitwise computations are independent of the endianness. Assume Big-Endian:

int value = 4111;                //   0x0000100F
int lastByte = value & 0xFF; // & 0x000000FF
// = 0x0000000F

Or Little-Endian:

int value = 4111;                //   0xF0010000
int lastByte = value & 0xFF; // & 0xFF000000
// = 0xF0000000

In both cases, the result is the same (in either of both forms).


One could now argue about the fact that 0x0000000F stands for 15, which implies big-endianness. This is at least implicitly defined in the definition of the lexical structure, in JLS Section 3.10.1, Integer Literals:

The largest positive hexadecimal, octal, and binary literals of type int - each of which represents the decimal value 2147483647 (2^31-1) - are respectively:

  • 0x7fff_ffff,
  • 0177_7777_7777, and
  • 0b0111_1111_1111_1111_1111_1111_1111_1111

Apart from that, the endianness is mainly relevant for storage and communication, but these are not language aspects and facilitated with things like the ByteOrder class, or on an API-level, like in the DataOutputStream::writeInt method:

Writes an int to the underlying output stream as four bytes, high byte first.


The only part where the endianness could be considered to influence the semantics of the language are the shift operations. But even there, it's mainly a matter of the interpretation of the language. The JLS Section 15.19 about Shift Operators states:

The value of n << s is n left-shifted s bit positions; this is equivalent (even if overflow occurs) to multiplication by two to the power s.

The value of n >> s is n right-shifted s bit positions with sign-extension. The resulting value is [ n / 2s ]. For non-negative values of n, this is equivalent to truncating integer division, as computed by the integer division operator /, by two to the power s.

The specification here states that the existing bits are shifted "left", and at the same time, that "left" is "the more significant position" (However, one could also say that << means "shifting right" in a Little-Endian world...)

Java and endianness

If you are mapping a buffer of a larger type over a ByteBuffer then you can specify the endianness using the ByteOrder values. Older core libraries assume network order.

From ByteBuffer:

Access to binary data

This class defines methods for reading and writing values of all other primitive types, except boolean. Primitive values are translated to (or from) sequences of bytes according to the buffer's current byte order, which may be retrieved and modified via the order methods. Specific byte orders are represented by instances of the ByteOrder class. The initial order of a byte buffer is always BIG_ENDIAN.

and ByteOrder provides access to the native order for the platform you're working on.

Compare that to the older DataInput which is not useful for interop with local native services:

Reads four input bytes and returns an int value. Let a-d be the first through fourth bytes read. The value returned is:

(((a & 0xff) << 24) | ((b & 0xff) << 16) |
((c & 0xff) << 8) | (d & 0xff))

Does the result of Integer.toHexString depend on the system endian?

Yes, the representation will always be the same. See Integer.toHexString(int) and Java's Virtual Machine's Endianness



Related Topics



Leave a reply



Submit