How Do Hashcode() and Identityhashcode() Work at the Back End

How does the JVM ensure that System.identityHashCode() will never change?

Modern JVMs save the value in the object header. I believe the value is typically calculated only on first use in order to keep time spent in object allocation to a minimum (sometimes down to as low as a dozen cycles). The common Sun JVM can be compiled so that the identity hash code is always 1 for all objects.

Multiple objects can have the same identity hash code. That is the nature of hash codes.

System.identityHashCode() can the same hashCode be returned after an Object is GC'ed

Is it possible that a new created object2 can have the same identity hash code as object1 got before it was GC'ed ?

Yes it is.

The identity hashcode might be derived from the address of the object when the method is first called for the object. (Or it might be generated some other way. The spec allows a lot of different mechanisms to be used.)

So, if the GC collects object1, and a new object object2 is allocated at the same address as the original one, and the new one may have the same hashcode as the original.

Furthermore, if the GC moves object1 after the hashcode has been generated, the new object (object2) at object1's original. Then, you could then end up with two existing objects having the same hashcode.

But neither of these things should be a problem. Hashcodes are not not designed to be identifiers for objects. (So don't try and use them as such.)



My understanding of identity is that objects are unique inside a JVM at a given point in time.

The identity is unique. The identity hashcodes are not. As the Object javadoc says:

"As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects."

That is nowhere near a guarantee of uniqueness.

And then:

"(This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)"

What it is referring to is the address of the object when hashCode() is first called. The contract for the hashCode() method states that the hashcode value cannot change. The identity hashcode is ... in effect ... remembered as part of the object so that the object can be moved by the GC.

And note that the latest javadocs have dropped all mention of how an identity hashcode is or might be generated. This change happened in Java 12.

Does hashcode return the memory address?

Not necessarily. From the documentation (emphasis mine):

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

When does the jvm assign hashcode value in the object header

  • JVM does not need to call hashCode method to initialize object's identity hashCode. It works the other way round: Object.hashCode and System.identityHashCode call JVM to compute or to extract previously computed identity hashCode.
  • It is not specified how JVM generates and stores identity hashCode. Different JVM implementations may do it differently.
  • HotSpot JVM computes identity hashCode on the first call to Object.hashCode or System.identityHashCode and stores it in the object header. The subsequent calls simply extract the previously computed value from the header.

How every object are different from one another

What you're seeing is because you're using String, which has a very special (and nearly unique) behavior: Your two strings are actually one String object, because string literals are automatically intern'd. The JDK and JVM work together to put string literals into a pool of String instances which are reused, rather than creating separate String instances for the same sequence of characters.

Try your experiment with new Object() instead:

 Object a = new Object();
Object b = new Object();
System.out.println(a.hashCode());
System.out.println(b.hashCode());
System.out.println(System.identityHashCode(a));
System.out.println(System.identityHashCode(b));

Also what is difference between hashcode() and system.identityhashcode()

The hashCode function can be overridden by a class to return something appropriate for that class. System.identityHashCode returns the same hashCode that Object#hashCode would have returned if the subclass hadn't overridden it.

So for Object, you'd get the same return value from each of them. But for any class that overrides hashCode to return something more appropriate for that class (which includes String), you'd get different values.

When does the jvm assign hashcode value in the object header

  • JVM does not need to call hashCode method to initialize object's identity hashCode. It works the other way round: Object.hashCode and System.identityHashCode call JVM to compute or to extract previously computed identity hashCode.
  • It is not specified how JVM generates and stores identity hashCode. Different JVM implementations may do it differently.
  • HotSpot JVM computes identity hashCode on the first call to Object.hashCode or System.identityHashCode and stores it in the object header. The subsequent calls simply extract the previously computed value from the header.

Will .hashcode() return a different int due to compaction of tenure space?

@erickson is more or less correct. The hashcode returned by java.lang.Object.hashCode() does not change for the lifetime of the object.

The way this is (typically) implemented is rather clever. When an object is relocated by the garbage collector, its original hashcode has to be stored somewhere in case it is used again. The obvious way to implement this would be to add a 32 bit field to the object header to hold the hashcode. But that would add a 1 word overhead to every object, and would waste space in the most common case ... where an Object's hashCode method is not called.

The solution is to add two flag bits to the object's flag word, and use them (roughly) as follows. The first flag is set when the hashCode method is called. A second flag tells the hashCode method whether to use the object's current address as the hashcode, or to use a stored value. When the GC runs and relocates an object, it tests these flags. If the first flag is set and second one is unset, the GC allocates one extra word at the end of the object and stores the original object location in that word. Then it sets the two flags. From then on, the hashCode method gets the hashcode value from the word at the end of the object.


In fact, an identityHashCode implementation has to behave this way to satisfy the following part of the general hashCode contract:

"Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application."

A hypothetical implementation of identityHashCode() that simply returned the current machine address of an object would violate the highlighted part if/when the GC moved the object to a different address. The only way around this would be for the (hypothetical) JVM to guarantee that an object never moves once hashCode has been called on it. And that would lead to serious and intractable problems with heap fragmentation.



Related Topics



Leave a reply



Submit