Bytebuffer.Allocate() VS. Bytebuffer.Allocatedirect()

ByteBuffer.allocate() vs. ByteBuffer.allocateDirect()

Ron Hitches in his excellent book Java NIO seems to offer what I thought could be a good answer to your question:

Operating systems perform I/O
operations on memory areas. These
memory areas, as far as the operating
system is concerned, are contiguous
sequences of bytes. It's no surprise
then that only byte buffers are
eligible to participate in I/O
operations. Also recall that the
operating system will directly access
the address space of the process, in
this case the JVM process, to transfer
the data. This means that memory areas
that are targets of I/O perations must
be contiguous sequences of bytes. In
the JVM, an array of bytes may not be
stored contiguously in memory, or the
Garbage Collector could move it at any
time. Arrays are objects in Java, and
the way data is stored inside that
object could vary from one JVM
implementation to another.

For this reason, the notion of a
direct buffer was introduced. Direct
buffers are intended for interaction
with channels and native I/O routines.
They make a best effort to store the
byte elements in a memory area that a
channel can use for direct, or raw,
access by using native code to tell
the operating system to drain or fill
the memory area directly.

Direct byte buffers are usually the
best choice for I/O operations. By
design, they support the most
efficient I/O mechanism available to
the JVM. Nondirect byte buffers can be
passed to channels, but doing so may
incur a performance penalty. It's
usually not possible for a nondirect
buffer to be the target of a native
I/O operation. If you pass a nondirect
ByteBuffer object to a channel for
write, the channel may implicitly do
the following on each call:

Create a temporary direct ByteBuffer
object.

Copy the content of the nondirect
buffer to the temporary buffer.

Perform the low-level I/O operation
using the temporary buffer.

The temporary buffer object goes out
of scope and is eventually garbage
collected.

This can potentially result in buffer
copying and object churn on every I/O,
which are exactly the sorts of things
we'd like to avoid. However, depending
on the implementation, things may not
be this bad. The runtime will likely
cache and reuse direct buffers or
perform other clever tricks to boost
throughput. If you're simply creating
a buffer for one-time use, the
difference is not significant. On the
other hand, if you will be using the
buffer repeatedly in a
high-performance scenario, you're
better off allocating direct buffers
and reusing them.

Direct buffers are optimal for I/O,
but they may be more expensive to
create than nondirect byte buffers.
The memory used by direct buffers is
allocated by calling through to
native, operating system-specific
code, bypassing the standard JVM heap.
Setting up and tearing down direct
buffers could be significantly more
expensive than heap-resident buffers,
depending on the host operating system
and JVM implementation. The
memory-storage areas of direct buffers
are not subject to garbage collection
because they are outside the standard
JVM heap.

The performance tradeoffs of using
direct versus nondirect buffers can
vary widely by JVM, operating system,
and code design. By allocating memory
outside the heap, you may subject your
application to additional forces of
which the JVM is unaware. When
bringing additional moving parts into
play, make sure that you're achieving
the desired effect. I recommend the
old software maxim: first make it
work, then make it fast. Don't worry
too much about optimization up front;
concentrate first on correctness. The
JVM implementation may be able to
perform buffer caching or other
optimizations that will give you the
performance you need without a lot of
unnecessary effort on your part.

`ByteBuffer.allocateDirect` and Xmx

I had to go on a scavenger hunt to find the reason, but here you go!

First, I looked at ByteBuffer#allocateDirect and found the following:

public static ByteBuffer allocateDirect(int capacity) {
    return new DirectByteBuffer(capacity);
}

I then navigated to the constructor of DirectByteBuffer and found the following method call:

Bits.reserveMemory(size, cap);

Looking in this method, we see:

while (true) {
    if (tryReserveMemory(size, cap)) {
        return;
    }

    if (sleeps >= MAX_SLEEPS) {
        break;
    }

    try {
        if (!jlra.waitForReferenceProcessing()) {
            Thread.sleep(sleepTime);
            sleepTime <<= 1;
            sleeps++;
        }
    } catch (InterruptedException e) {
        interrupted = true;
    }
}

// no luck
throw new OutOfMemoryError("Direct buffer memory");

This seems to be where you received this error, but now we need to figure out why it's caused. For that, I looked into the call to tryReserveMemory and found the following:

private static boolean tryReserveMemory(long size, int cap) {
    long totalCap;

    while (cap <= maxMemory - (totalCap = totalCapacity.get())) {
        if (totalCapacity.compareAndSet(totalCap, totalCap + cap)) {
            reservedMemory.addAndGet(size);
            count.incrementAndGet();
            return true;
        }
    }

    return false;
}

I was curious about the maxMemory field, and looked to where it was declared:

private static volatile long maxMemory = VM.maxDirectMemory();

Now I had to look at the maxDirectMemory within VM.java:

public static long maxDirectMemory() {
    return directMemory;
}

Finally, let's look at the declaration of directMemory:

// A user-settable upper limit on the maximum amount of allocatable direct
// buffer memory.  This value may be changed during VM initialization if
// "java" is launched with "-XX:MaxDirectMemorySize=<size>".
//
// The initial value of this field is arbitrary; during JRE initialization
// it will be reset to the value specified on the command line, if any,
// otherwise to Runtime.getRuntime().maxMemory().
//
private static long directMemory = 64 * 1024 * 1024;

Hey, look at that! If you don't manually specify this using "-XX:MaxDirectMemorySize=<size>", then it defaults to Runtime.getRuntime().maxMemory(), which is the heap size that you set.

Seeing as -Xmx1G is smaller than Integer.MAX_VALUE bytes, the call to tryReserveMemory will never return true, which results in sleeps >= MAX_SLEEPS, breaking out of the while-loop, throwing your OutOfMemoryError.

If we look at Runtime.getRuntime().maxMemory(), then we see why it works if you don't specify the max heap size:

/**
 * Returns the maximum amount of memory that the Java virtual machine
 * will attempt to use.  If there is no inherent limit then the value
 * {@link java.lang.Long#MAX_VALUE} will be returned.
 *
 * @return  the maximum amount of memory that the virtual machine will
 *          attempt to use, measured in bytes
 * @since 1.4
 */
public native long maxMemory();

Difference between ByteBuffer.allocateDirect() and MappedByteBuffer.load()

Direct ByteBuffers (those allocated using ByteBuffer.allocateDirect) are different to MappedByteBuffers in that they represent different sections of memory and are allocated differently. Direct ByteBuffers are a way to access a block of memory allocated outside of the JVM generally allocated with a malloc call (although most implementations will probably use an efficient slab allocator). I.e. it's just a pointer to a block of memory.

A MappedByteBuffer represents a section of memory allocated using mmap call, which is used to perform memory mapped I/O. Therefore MappedByteBuffers won't register their use of memory in the same way a Direct ByteBuffer will.

So while both are "direct" in that they represent memory outside of the JVM their purposes are different.

As an aside, in order to get the reservedMemory value you are reflectively calling to an internal method of the JVM, whose implementation is not covered by any specification, therefore there are no guarantees as to what that value returns. Direct ByteBuffers can be allocated from within JNI using NewDirectByteBuffer call from C/C++ (MappedByteBuffers likely use this) and this probably doesn't affect the reservedMemory value, which is may only changed when using the Java ByteBuffer.allocateDirect.

Direct java.nio.ByteBuffer vs Java Array Performance Test

It won't be considered for GC

Of course it will be considered for GC.

It is the Garbage Collector that determines that the buffer is no longer in use, and then deallocates the memory.

Should I not expect my benchmark to show [that] off-heap buffer [is] faster than heap buffer?

Being off-heap doesn't make the buffer faster for memory access.

A direct buffer will be faster when Java exchanges the bytes in the buffer with the operating system. Since your code is not doing I/O, there is no performance benefit to using a direct buffer.

As the javadoc says it:

Given a direct byte buffer, the Java virtual machine will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations.

ByteBuffer allocate direct example

You can't explicitly free the allocated memory, but you can clear the buffer and then write zeros (or random bytes) to the buffer when you are done. This will destroy any data that was previously stored in the buffer, reducing the window of attack.

pass.clear();
while (pass.hasRemaining())
  pass.put((byte) 0);

Difference between ByteBuffer.allocateDirect() and glGenBuffers()

But why in every basic OpenGL Tutorial for rendering triangle is used OpenGL object and in every Android OpenGL ES tutorial for rendering triangle is used java object instead of OpenGL object?

Because they can.

ByteBuffer represents a CPU memory allocation. An OpenGL buffer object represents a GPU-accessible memory allocation.

OpenGL ES 2.0 allows users to specify vertex data from either CPU memory or buffer objects. Desktop core OpenGL removed this ability since 2009. So modern desktop OpenGL tutorials use buffer objects for vertex data.

Note that ES 3.0 explicitly says that "client-side vertex arrays":

are present to maintain backward compatibility with OpenGL ES 2.0, but their use in not recommended as it is likely for these features to be removed in a future version.

They haven't been removed yet. But they do interfere with a number of features. For example, they make gl_VertexID undefined.

So you should never use ByteBuffer directly for vertex arrays. And you should avoid learning materials that do so, as they may teach you other bad practices.

Bytebuffer.Allocate() VS. Bytebuffer.Allocatedirect()