Gc Overhead of Optional<T> in Java

GC overhead of OptionalT in Java

We all know that every object allocated in Java adds a weight into future garbage collection cycles,…

That sounds like a statement nobody could deny, but let’s look at the actual work of a garbage collector, considering common implementations of modern JVMs and the impact of an allocated object on it, especially objects like Optional instances which are typically of a temporary nature.

The first task of the garbage collector is to identify objects which are still alive. The name “garbage collector” puts a focus on identifying garbage, but garbage is defined as unreachable objects and the only way to find out which objects are unreachable, is via the process of elimination. So the first task is solved by traversing and marking all reachable objects. So the costs of this process do not depend on the total amount of allocated objects, but only those, which are still reachable.

The second task is to make the memory of the garbage available to new allocations. Instead of puzzling with the memory gaps between still reachable objects, all modern garbage collectors work by evacuating a complete region, transferring all alive objects withing that memory to a new location and adapting the references to them. After the process, the memory is available to new allocations as a whole block. So this is again a process whose costs do not depend on the total amount of allocated objects, but only (a part of) the still alive objects.

Therefore, an object like a temporary Optional may impose no costs on the actual garbage collection process at all, if it is allocated and abandoned between two garbage collection cycles.

With one catch, of course. Each allocation will reduce the memory available to subsequent allocations until there’s no space left and the garbage collection has to take place. So we could say, each allocation reduces the time between two garbage collection runs by the size of the allocation space divided by the object size. Not only is this a rather tiny fraction, it also only applies to a single threaded scenario.

In implementations like the Hotspot JVM, each thread uses a thread local allocation buffer (TLAB) for new objects. Once its TLAB is full, it will fetch a new one from the allocation space (aka Eden space). If there is none available, a garbage collection will be triggered. Now it’s rather unlikely that all threads hit the end of their TLAB right at the same time. So for the other threads which still have some space in their TLAB left at this time, it would not make any difference if they had allocated some more objects still fitting in that remaining space.

The perhaps surprising conclusion is that not every allocated object has an impact on the garbage collection, i.e. a purely local object allocated by a thread not triggering the next gc, could be entirely free.

Of course, this does not apply to allocating a large amount of objects. Allocating lots of them causes the thread to allocate more TLABs and eventually trigger the garbage collection earlier than without. That’s why we have classes like IntStream allowing to process a large number of elements without allocating objects, as would happen with a Stream<Integer>, while there is no problem in providing the result as a single OptionalInt instance. As we know now, a single temporary object has only a tiny impact on the gc, if any.

This did not even touch the JVM’s optimizer, which may eliminate object allocations in hot spots, if Escape Analysis has proven that the object is purely local.

Performance of Java Optional

Optional<T> is just a normal generic class which contains a reference of type T. Thus, it adds a single layer of indirection. The method calls themselves won't be very expensive either, since the class is final and so the dynamic dispatch can be avoided.

The only place where you could have performance problems is when working with very large numbers of such instances, but even then the performance of something like a Stream<Optional<String>> is not bad at all. However, when working with large amounts of primitive values, you'll find a performance hit using Stream<Integer> (or Integer[]) versus the primitive specialization IntStream (or int[]) due to this layer of indirection requiring very frequent instantiation of Integer objects. However, this is a penalty that we already know and do pay when using things like ArrayList<Integer>.

You would obviously experience the same hit with Stream<OptionalInt> / OptionalInt[], since an OptionalInt is basically a class with an int field and a boolean flag for presence (unlike with Optional<T> which can make do with only the T field) and thus quite similar to Integer although bigger in size. And of course, a Stream<Optional<Integer>> would add two levels of indirection, with the corresponding double performance penalty.

Adjusting GC Overhead Exceeded parameters

By default, OOME occurs when more than 98% of the time is spent in GC and less than 2% of the heap is recovered

These two values are configured via GCHeapFreeLimit and GCTimeLimit

GCTimeRatio only defines a soft goal for which the GC heuristics will optimize.

Why Java Optional performance increase with number of chained calls?

Even such a powerful tool like JMH is not able to save from all benchmarking pitfalls.
I've found two different issues with this benchmark.

1.

HotSpot JIT compiler speculatively optimizes code basing on runtime profile. In the given "full" scenario Optional never sees null values. That's why Optional.ofNullable method (also called by Optional.map) happens to be optimized exclusively for non-null path which constructs a new non-empty Optional. In this case JIT is able to eliminate all short-lived allocations and perform all map operations without intermediate objects.

public static <T> Optional<T> ofNullable(T value) {
    return value == null ? empty() : of(value);
}

In "small" and "large" scenarios the mapping sequence finally ends with Optional.empty(). That is, both branches of ofNullable method are compiled, and JIT is no longer able to eliminate allocations of intermediate Optional objects - data flow graph appears to be too complex for Escape Analysis to succeed.

Check it by running JMH with -prof gc, and you'll see that "small" allocates 48 bytes (3 Optionals) per iteration, "large" allocates 96 bytes (6 Optionals), and "full" allocates nothing.

Benchmark                                                      (filling)  Mode  Cnt     Score     Error   Units
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      empty  avgt    5    ≈ 10⁻⁶              B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      small  avgt    5    48,000 ±   0,001    B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm      large  avgt    5    96,000 ±   0,001    B/op
OptionalBenchmark.optionalsWithMethodRefs:·gc.alloc.rate.norm       full  avgt    5    ≈ 10⁻⁵              B/op

If you replace new Country("France") with new Country(null), the opimization will also break, and "full" scenario will become expectedly slower than "small" and "large".

Alternatively, the following dummy loop added to setUp will also prevent from overoptimizing ofNullable making the benchmark results more realistic.

    for (int i = 0; i < 1000; i++) {
        Optional.ofNullable(null);
    }

2.

Surprisingly, nullChecks benchmark also appears faster in "full" scenario. The reason here is class initialization barriers. Note that only "full" case initializes all related classes. In "small" and "large" cases nullChecks method refers to some classes that are not yet initialized. This prevents from compiling nullChecks efficiently.

If you explicitly initialize all the classes in setUp, e.g. by creating a dummy object, then "empty", "small" and "large" scenarios of nullChecks will become faster.

Room dummy = new Room(new Flat(new Floor(new Building(new Block(new District(new City(new Country("France"))))))))

Does garbage collection change the object addresses in Java?

I read that garbage collection can lead to memory fragmentation problem at run-time.

This is not an exclusive problem of garbage collected heaps. When you have a manually managed heap and free memory in a different order than the preceding allocations, you may get a fragmented heap as well. And being able to have different lifetimes than the last-in-first-out order of automatic storage aka stack memory, is one of the main motivations to use the heap memory.

To solve this problem, compacting is done by the JVM where it takes all the active objects and assigns them contiguous memory.

Not necessarily all objects. Typical implementation strategies will divide the memory into logical regions and only move objects from a specific region to another, but not all existing objects at a time. These strategies may incorporate the age of the objects, like generational collectors moving objects of the young generation from the Eden space to a Survivor space, or the distribution of the remaining objects, like the “Garbage First” collector which will, like the name suggests, evacuate the fragment with the highest garbage ratio first, which implies the smallest work to get a free contiguous memory block.

This means that the object addresses must change from time to time?

Of course, yes.

Also, if this happens,
Are the references to these objects also re-assigned?

The specification does not mandate how object references are implemented. An indirect pointer may eliminate the need to adapt all references, see also this Q&A. However, for JVMs using direct pointers, this does indeed imply that these pointers need to get adapted.

Won't this cause significant performance issues? How does Java cope with it?

First, we have to consider what we gain from that. To “eliminate fragmentation” is not an end in itself. If we don’t do it, we have to scan the reachable objects for gaps between them and create a data structure maintaining this information, which we would call “free memory” then. We also needed to implement memory allocations as a search for matching chunks in this data structure or to split chunks if no exact match has been found. This is a rather expensive operation compared to an allocation from a contiguous free memory block, where we only have to bump the pointer to the next free byte by the required size.

Given that allocations happen much more often than garbage collection, which only runs when the memory is full (or a threshold has been crossed), this does already justify more expensive copy operations. It also implies that just using a larger heap can solve performance issues, as it reduces the number of required garbage collector runs, whereas the number of survivor objects will not scale with the memory (unreachable objects stay unreachable, regardless of how long you defer the collection). In fact, deferring the collection raises the chances that more objects became unreachable in the meanwhile. Compare also with this answer.

The costs of adapting references are not much higher than the costs of traversing references in the marking phase. In fact, non-concurrent collectors could even combine these two steps, transferring an object on first encounter and adapting subsequently encountered references, instead of marking the object. The actual copying is the more expensive aspect, but as explained above, it is reduced by not copying all objects but using certain strategies based on typical application behavior, like generational approaches or the “garbage first” strategy, to minimize the required work.

How to assert some Java method is garbage free?

Each thread in JVM maintain allocation counter, counting bytes allocated since thread creation.
You can use this counter to measure amount of allocation done by piece of code (though it should run in parent thread).

Below is the snippet of code I usually use for that purpose.

import java.lang.management.ManagementFactory;

public class MemMeter {

    private static long OFFSET = measure(new Runnable() {
        @Override
        public void run() {
        }
    });

    /**
     * @return amount of memory allocated while executing provided {@link Runnable}
     */
    public static long measure(Runnable x) {
       long now = getCurrentThreadAllocatedBytes();
       x.run();
       long diff = getCurrentThreadAllocatedBytes() - now;
       return diff - OFFSET;
    }

    @SuppressWarnings("restriction")
    private static long getCurrentThreadAllocatedBytes() {
        return ((com.sun.management.ThreadMXBean)ManagementFactory.getThreadMXBean()).getThreadAllocatedBytes(Thread.currentThread().getId());
    }
}

Here you can find more details on this topic.

Impact of Java streams from GC perspective or handling short-lived objects by the GC

To be fair, it's very complicated to give an answer when Holger already linked the main idea via his answer; still I will try to.

Extra pressure on GC - may be. Extra time for a GC cycle to execute - most probably not. Ignorable? I'd say totally. In the end what you care from a GC - that it takes little time to reclaim lots of space, preferably with super tiny stop-the-world events.

Let's talk about the potential overhead in the GC main two phases : mark and evacuation/realocation (Shenandoah/ZGC). First mark phase, where GC finds out what is garbage (by actually identifying what is alive).

If objects that were created by the Stream internals are not reachable, they will never be scanned (zero overhead here) and if they are reachable, scanning them will be extremely fast. The other side of the story is: when you create an Object and GC might touch it while it's running in the mark phase, the slow path of a LoadBarrier (in case of Shenandoah) will be active. This will add some tens of ns I assume to the total time of that particular phase of the GC as well as some space in the SATB queues. Aleksey Shipilev in one talk said that he tried to measure the overhead from executing a single barrier and could not, so he measured 3 and the time was in the region of tens of ns. I don't know the exact details of ZGC, but a LoadBarrier is there in place too.

The main point is that this mark phase is done in a concurrent fashion, while the application is running, so you application will still run perfectly fine. And even if some GC code will be triggered to do something specific work (Load Barrier), it will be extremely fast and completely transparent to you.

The second phase is "compactation", or making space for future allocations. What a GC does is move live objects from regions with the most garbage (Shenandoah for sure) to regions that are empty. But only live objects. So if a certain region has 100 objects and only 1 is alive, only 1 will be moved, then that entire region is going to be marked as free. So potentially if the Stream implementation generated only garbage (i.e.: not currently alive), it is "free lunch" for GC, it will not even know it existed.

The better picture here is that this phase is still done concurrently. To keep the "concurrency" active, you need to know how much was allocated from start to end of a GC cycle. This amount is the minimum "extra" space you need to have on top of the java process in order for a GC to be happy.

So overall, you are looking at a super tiny impact; if any at all.

Is using Optional.ofNullable as a replacement for the ternary operator a good practice?

Whenever I come to think of using the Optional API for a specific purpose I always remind my self of what it was intended to do and why it was brought into the JDK and i.e.

Optional in intended to provide a limited mechanism for library method
return types where there is a clear need to represent “no result” and
where using null for this is overwhelmingly likely to cause errors - Stuart Marks

Optional is primarily focused on a return type that might or might not have a return value.

Over using this construct like in this specific example of yours just causes extra memory allocation and GC overhead.

I’d keep things simple and instead do:

String hi = sayHi();
if(hi == null) hi = “-“;
...

Gc Overhead of Optional<T> in Java