Relation Between Memory Host and Memory Arguments Xms and Xmx from Java

Relation between memory host and memory arguments xms and xmx from Java

Memory used by Java process (as seen by the OS) is not only limited to Java Heap. There are a lot more memory areas that should be also counted:

  • Metaspace (where class metadata resides);
  • Code Cache (storage for JIT-compiled methods and all the generated code);
  • Direct ByteBuffers;
  • Memory-mapped files, including files mapped by JVM, e.g. all JAR files on the classpath;
  • Thread stacks;
  • JVM code itself and all the dynamic libraries loaded by Java Runtime;
  • Other internal JVM structures.

Use NativeMemoryTracking JDK feature to get the detailed breakdown of memory areas used by JVM:

java -XX:NativeMemoryTracking=detail -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics

Do the -Xms and -Xmx flags reserve the machine's resources?

Xmx merely reserves virtual address space.
Xms actually allocates (commits) it but does not necessarily prefault it.

How operating systems respond to allocations varies.

Windows does allow you to reserve very large chunks of address space (Xmx) but will not allow overcommit (Xms). The limit is defined by swap + physical. The exception are large pages (which need to be enabled with a group policy setting), which will limit it by physical ram.

Linux behavior is more complicated, it depends on the vm.overcommit_memory and related sysctls and various flags passed to the mmap syscall, which to some extent can be controlled by JVM configuration flags. The behavior can range from a) Xms can exceed total ram + swap to b) Xmx is capped by available physical ram.

Equal -Xms and -Xmx in JVM

Does this mean the garbage collector wont activate until the JVM reaches its heap limit?

No, the garbage collector will activate its collection cycles whenever the GC algorithm tells it to - which will usually be frequent incremental collections, and occasional full collections. Running out of heap can force it to perform a full collection more often and be more aggressive, but it's certainly not true (for all GC implementations I'm aware of anyway) to say it won't do anything until it's out of, or nearly out of heap space.

What could be happening (depending on the GC algorithm) is that the incremental collections are performing regularly & as intended, but is unable to clear everything properly - and the slower full collection won't be performed until you're lower on space. That's perfectly valid behaviour - you've given the VM 4GB heap, and it'll use that in the way it sees fit. If that's the case, you'd see a sudden drop when it gets closer to the 4GB limit and performs a full collection.

Alternatively, it's possible you have a memory leak in the application somewhere and it's going to run out of memory eventually, be unable to free any space, and then throw an OutOfMemoryError.

If it were me I'd profile & stress test the application under a testing environment with the same heap & GC options set to check the behaviour - that's really the only way to find out for sure.

Java heap Xms and linux free memory different

Even when -Xmx and -Xms set to the same value, the space reserved for Java Heap is not immediately allocated in RAM.

Operating System typically allocates physical memory lazily, only on the first access to a virtual page. So, while unused part of Java Heap is not touched, it won't really consume memory.

You may use -XX:+AlwaysPreTouch option to forcibly touch all heap pages on JVM start.

JVM Option: Xmx Allocation

The JVM actually does allocate all memory requested by -Xmx at startup time, along with additional memory to hold the JVM executable and internal work areas. And when you create threads, it also allocates memory for the stacks used for those threads.

This works because (1) the JVM doesn't actually use this memory, and (2) the OS provides a paging file. When the JVM requests an allocation, it creates a commitment by the OS but does not actually use all RAM requested. As it actually uses the RAM, the OS will swap pages to/from the pagefile. If all processes were actively using their RAM, the OS would have to swap pages in and out constantly; this is called "thrashing."

The -Xms parameter specifies the initial heap size, within the overall heap size. The JVM will attempt to keep memory within these bounds, but is permitted to expand the bounds if it cannot reclaim enough garbage. However, these heap increases do not incrementally request more memory from the OS. If they did, the heap would be fragmented and a large array allocation might fail (because it wouldn't fit in contiguous memory).

Why are JVM memory parameters usually in multiples of 256?

It is not necessary to use a multiple of 2 for the JVM memory parameters. It is just common use for memory allocation to double the value if the old one isn't enough.

If you increase your assigned memory value in 1MB steps you will have to adjust the value several (hundred) times before the configuration matches you requirements. So it it just more comfortable to double the old value.

This relies on the fact that memory is a cheap ressource in those days.

EDIT:

As you already mentioned it is possible to assign values like 1000 MB or 381 MB. The JVM can handle every memory size that is big enough to host the permGenSpace, the stack and the heap.

Java using much more memory than heap size (or size correctly Docker memory limit)

Virtual memory used by a Java process extends far beyond just Java Heap. You know, JVM includes many subsytems: Garbage Collector, Class Loading, JIT compilers etc., and all these subsystems require certain amount of RAM to function.

JVM is not the only consumer of RAM. Native libraries (including standard Java Class Library) may also allocate native memory. And this won't be even visible to Native Memory Tracking. Java application itself can also use off-heap memory by means of direct ByteBuffers.

So what takes memory in a Java process?

JVM parts (mostly shown by Native Memory Tracking)

  1. Java Heap

The most obvious part. This is where Java objects live. Heap takes up to -Xmx amount of memory.


  1. Garbage Collector

GC structures and algorithms require additional memory for heap management. These structures are Mark Bitmap, Mark Stack (for traversing object graph), Remembered Sets (for recording inter-region references) and others. Some of them are directly tunable, e.g. -XX:MarkStackSizeMax, others depend on heap layout, e.g. the larger are G1 regions (-XX:G1HeapRegionSize), the smaller are remembered sets.

GC memory overhead varies between GC algorithms. -XX:+UseSerialGC and -XX:+UseShenandoahGC have the smallest overhead. G1 or CMS may easily use around 10% of total heap size.


  1. Code Cache

Contains dynamically generated code: JIT-compiled methods, interpreter and run-time stubs. Its size is limited by -XX:ReservedCodeCacheSize (240M by default). Turn off -XX:-TieredCompilation to reduce the amount of compiled code and thus the Code Cache usage.


  1. Compiler

JIT compiler itself also requires memory to do its job. This can be reduced again by switching off Tiered Compilation or by reducing the number of compiler threads: -XX:CICompilerCount.


  1. Class loading

Class metadata (method bytecodes, symbols, constant pools, annotations etc.) is stored in off-heap area called Metaspace. The more classes are loaded - the more metaspace is used. Total usage can be limited by -XX:MaxMetaspaceSize (unlimited by default) and -XX:CompressedClassSpaceSize (1G by default).


  1. Symbol tables

Two main hashtables of the JVM: the Symbol table contains names, signatures, identifiers etc. and the String table contains references to interned strings. If Native Memory Tracking indicates significant memory usage by a String table, it probably means the application excessively calls String.intern.


  1. Threads

Thread stacks are also responsible for taking RAM. The stack size is controlled by -Xss. The default is 1M per thread, but fortunately the things are not so bad. OS allocates memory pages lazily, i.e. on the first use, so the actual memory usage will be much lower (typically 80-200 KB per thread stack). I wrote a script to estimate how much of RSS belongs to Java thread stacks.

There are other JVM parts that allocate native memory, but they do not usually play a big role in total memory consumption.

Direct buffers

An application may explicitly request off-heap memory by calling ByteBuffer.allocateDirect. The default off-heap limit is equal to -Xmx, but it can be overridden with -XX:MaxDirectMemorySize. Direct ByteBuffers are included in Other section of NMT output (or Internal before JDK 11).

The amount of used direct memory is visible through JMX, e.g. in JConsole or Java Mission Control:

BufferPool MBean

Besides direct ByteBuffers there can be MappedByteBuffers - the files mapped to virtual memory of a process. NMT does not track them, however, MappedByteBuffers can also take physical memory. And there is no a simple way to limit how much they can take. You can just see the actual usage by looking at process memory map: pmap -x <pid>

Address           Kbytes    RSS    Dirty Mode  Mapping
...
00007f2b3e557000 39592 32956 0 r--s- some-file-17405-Index.db
00007f2b40c01000 39600 33092 0 r--s- some-file-17404-Index.db
^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^

Native libraries

JNI code loaded by System.loadLibrary can allocate as much off-heap memory as it wants with no control from JVM side. This also concerns standard Java Class Library. In particular, unclosed Java resources may become a source of native memory leak. Typical examples are ZipInputStream or DirectoryStream.

JVMTI agents, in particular, jdwp debugging agent - can also cause excessive memory consumption.

This answer describes how to profile native memory allocations with async-profiler.

Allocator issues

A process typically requests native memory either directly from OS (by mmap system call) or by using malloc - standard libc allocator. In turn, malloc requests big chunks of memory from OS using mmap, and then manages these chunks according to its own allocation algorithm. The problem is - this algorithm can lead to fragmentation and excessive virtual memory usage.

jemalloc, an alternative allocator, often appears smarter than regular libc malloc, so switching to jemalloc may result in a smaller footprint for free.

Conclusion

There is no guaranteed way to estimate full memory usage of a Java process, because there are too many factors to consider.

Total memory = Heap + Code Cache + Metaspace + Symbol tables +
Other JVM structures + Thread stacks +
Direct buffers + Mapped files +
Native Libraries + Malloc overhead + ...

It is possible to shrink or limit certain memory areas (like Code Cache) by JVM flags, but many others are out of JVM control at all.

One possible approach to setting Docker limits would be to watch the actual memory usage in a "normal" state of the process. There are tools and techniques for investigating issues with Java memory consumption: Native Memory Tracking, pmap, jemalloc, async-profiler.

Update

Here is a recording of my presentation Memory Footprint of a Java Process.

In this video, I discuss what may consume memory in a Java process, how to monitor and restrain the size of certain memory areas, and how to profile native memory leaks in a Java application.



Related Topics



Leave a reply



Submit