What is in Java object header?
For HotSpot:
The object header consists of a mark word and a klass pointer.
The mark word has word size (4 byte
on 32 bit architectures, 8 byte
on 64 bit architectures) and
the klass pointer has word size on 32 bit
architectures. On 64 bit
architectures the klass pointer either has word size, but can also have 4 byte
if the heap addresses can be encoded in these 4 bytes
.
This optimization is called "compressed oops" and you can also control it with the option UseCompressedOops
.
You can also find a wiki entry about this 1.
The mark word is actually used for many things.
- One is
Biased Locking
2 through which HotSpot can implement efficient locking. - It is also used during
GC to set forward pointers
, andto store the age of the objects
. The identity hash code of an object can be stored inside the mark (theSystem.identityHashCode
/Object.hashCode
one).
There is a comment in the source code of markOop.hpp that describes the layout depending on the architecture:
// 32 bits:
// --------
// hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:23 epoch:2 age:4 biased_lock:1 lock:2 (biased object)
// size:32 ------------------------------------------>| (CMS free block)
// PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
//
// 64 bits:
// --------
// unused:25 hash:31 -->| unused:1 age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:54 epoch:2 unused:1 age:4 biased_lock:1 lock:2 (biased object)
// PromotedObject*:61 --------------------->| promo_bits:3 ----->| (CMS promoted object)
// size:64 ----------------------------------------------------->| (CMS free block)
//
// unused:25 hash:31 -->| cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && normal object)
// JavaThread*:54 epoch:2 cms_free:1 age:4 biased_lock:1 lock:2 (COOPs && biased object)
// narrowOop:32 unused:24 cms_free:1 unused:4 promo_bits:3 ----->| (COOPs && CMS promoted object)
// unused:21 size:35 -->| cms_free:1 unused:7 ------------------>| (COOPs && CMS free block)
You can also find the oop header file here.
- 1 https://wiki.openjdk.java.net/display/HotSpot/CompressedOops
- 2 https://wiki.openjdk.java.net/display/HotSpot/Synchronization
Details about mark word of java object header
When using flat pointers, the lowest bits of address pointers are always zero due to alignment and allow marking special state by writing ones into these bits. So the CMS set the lowest bit of the klass
pointer to one when it wants to denote that the particular chunk of memory is not an object (anymore), but free memory.
But the compressed pointer feature utilizes the same property to address more memory via a 32 bit pointer by right shifting the address and leaving no unused lower bits. Therefore, the CMS has to store this bit somewhere else, i.e. the cms_free_bit
in question.
Source: concurrentMarkSweepGeneration.cpp:
// A block of storage in the CMS generation is always in
// one of three states. A free block (FREE), an allocated
// object (OBJECT) whose size() method reports the correct size,
// and an intermediate state (TRANSIENT) in which its size cannot
// be accurately determined.
// STATE IDENTIFICATION: (32 bit and 64 bit w/o COOPS)
// -----------------------------------------------------
// FREE: klass_word & 1 == 1; mark_word holds block size
//
// OBJECT: klass_word installed; klass_word != 0 && klass_word & 1 == 0;
// obj->size() computes correct size
//
// TRANSIENT: klass_word == 0; size is indeterminate until we become an OBJECT
//
// STATE IDENTIFICATION: (64 bit+COOPS)
// ------------------------------------
// FREE: mark_word & CMS_FREE_BIT == 1; mark_word & ~CMS_FREE_BIT gives block_size
//
// OBJECT: klass_word installed; klass_word != 0;
// obj->size() computes correct size
//
// TRANSIENT: klass_word == 0; size is indeterminate until we become an OBJECT
Creating a custom JVM with larger object header
It is definitely possible to enlarge the object header (I've seen such experiments before), though this won't be as easy as just adding a new field into class oopDesc. I believe there are multiple places in JVM code that rely on the size of object header, but there are should not be too much. The size of object header already differs depending on the platform and the UseCompressedOops
option, so the most places in the code already use relative offsets and won't suffer from a new field.
The other option is not to expand the header, but rather add a new fake field to java.lang.Object
class. HotSpot already has the machinery for adding such fields, look for InjectedField in the sources. However, this won't be trivial either. There are some hardcoded offsets for system classes, see JavaClasses::check_offsets. These need to be fixed, too.
The both approaches are roughly equal in terms of implementation efforts. In both cases I suggest to start with debug
(not fastdebug
) builds of JVM as they include many helpful assertions that will catch the possible offset problems early.
Having heard of your project, I think you also have the third option: give up "JVMTI only" requirement and rewrite some parts of the agent in Java leveraging the power of bytecode instrumentation and JIT compilation. Yes, this may slightly change Java code being executed and probably result in more classes loaded, but does this really matter, if from the user's perspective the impact will be even less than with JVMTI-only agent? I mean, the performance impact could be significantly less when there are no Java<->native switches, JVMTI overhead and so on. If the agent has low overhead and works with stock JVM, I guess it's not a big problem to make it ON in production in order to get its cool features, is it?
What is an overhead for creating Java objects from lines of csv file
Tom Hawtin made good points - I just wanna expand on them and provide a bit more details.
Java Strings take at least 40 bytes of memory (that's for empty string) due to java object header (see later) overhead and an internal byte array.
That means the minimal size for non-empty string (1 or more characters) is 48 bytes.
Nowawadays, JVM uses Compact Strings which means that ASCII-only strings only occupy 1 byte per character - before it was 2 bytes per char minimum.
That means if your file contains characters beyond ASCII set, then memory usage can grow significantly.
Streams also have more overhead compared to plain iteration with arrays/lists (see here Java 8 stream objects significant memory usage)
I guess your UserModel object adds at least 32 bytes overhead on top of each line, because:
- the minimum size of java object is 16 bytes where first 12 bytes are the JVM "overhead": object's class reference (4 bytes when Compressed Oops are used) + the Mark word (used for identity hash code, Biased locking, garbage collectors)
- and the next 4 bytes are used by the reference to the first "token"
- and the next 12 bytes are used by 3 references to the second, third and fourth "token"
- and the last 4 bytes are required due to Java Object Alignment at 8-byte boundaries (on 64-bit architectures)
That being said, it's not clear whether you even use all the data that you read from the file - you parse 4 tokens from a line but maybe there are more?
Moreover, you didn't mention how exactly the heap size "grew" - If it was the commited
size or the used
size of the heap. The used
portion is what actually is being "used" by live objects, the commited
portion is what has been allocated by the JVM at some point but could be garbage-collected later; used < commited
in most cases.
You'd have to take a heap snapshot to find out how much memory actually the result set of UserModel
occupies and that would actually be interesting to compare to the size of the file.
When does the jvm assign hashcode value in the object header
- JVM does not need to call
hashCode
method to initialize object's identity hashCode. It works the other way round:Object.hashCode
andSystem.identityHashCode
call JVM to compute or to extract previously computed identity hashCode. - It is not specified how JVM generates and stores identity hashCode. Different JVM implementations may do it differently.
- HotSpot JVM computes identity hashCode on the first call to
Object.hashCode
orSystem.identityHashCode
and stores it in the object header. The subsequent calls simply extract the previously computed value from the header.
Java explain how objects are stored in memory
Objects are stored in memory as a block of fields with Object headers. You can take a look at jol to see how Objects are actually laid out in memory. Each object has header, fields might have padding, fields might take more space than you think (boolean
), etc.
You can take a look at this example where I tried to explain things a little more, but the github page about jol is extensive in its examples.
At the bytecode
level accessing an Object field is boring, to be honest, but you can take a look for sure at what javac
produces (with javap
). When code is executed on the CPU, you will see different offsets when trying to get a certain field, like:
mov 0x10(%rsi),%r10
this is accessing "something" at a certain 16
offset (0x10
is an offset here). Think about an Object like a stack, accessing fields - you need to know how big each one is (VM tracks that) and the beginning address of the stack, the rest is easy.
The MUST read here if you really want to know things starts from this page.
Why compressed Oops gives 12 bytes for Object Header
As far as I know that happens because, contrary to the klass word, the mark word is not encoded by using CompressedOops.
So 4 bytes (64 bit compressed klass word) + 8 bytes (mark word) = 12 bytes (header)
java.lang.Integer object layout and it's overhead
... but what about the remaining 4 bytes?
A missing 4 bytes would be padding to the next 8 byte boundary. Java heap nodes are a multiple of 8 bytes in size.
But since this is a 64 bit JVM, the mark word is 8 bytes rather than 4 bytes, making 12 bytes of header overhead in total.
Related Topics
Simple Sso - Using Custom Authentication - Cas or Some Oauth or Openid Server
How to Find How Much Disk Space Is Left Using Java
Hbase Client Connectionloss for /Hbase Error
Is There an Advantage to Running Jruby If You Don't Know Any Java
Aes Cbc Pkcs5Padding Java to Ruby
Android 6.0 (Marshmallow): How to Play Midi Notes
Why Does a Try/Catch Block Create New Variable Scope
Check Chains of "Get" Calls for Null
How to Create My Own Appender in Log4J
How to Increase the Number of Displayed Lines of a Java Stack Trace Dump
Compare Protocol in Swift VS Interface in Java
How to Change Method Behaviour Through Reflection
Adding Header to All Request with Retrofit 2