How Does the Jvm Ensure That System.Identityhashcode() Will Never Change

How does the JVM ensure that System.identityHashCode() will never change?

Modern JVMs save the value in the object header. I believe the value is typically calculated only on first use in order to keep time spent in object allocation to a minimum (sometimes down to as low as a dozen cycles). The common Sun JVM can be compiled so that the identity hash code is always 1 for all objects.

Multiple objects can have the same identity hash code. That is the nature of hash codes.

System.identityHashCode() behavior on primitives

However, one user case just showed me that this is valid only for values in [0, 125]

System.identityHashCode(Object) takes an Object not a primitive which means that your i++ is auto-boxed to be an Integer (or Long or ...). The only time that the objects will have the same identity hashcode will most likely (but not always) be when they are the same object. As it happens, the JVM caches a small number of Integers for optimization purposes meaning that the identity hashcodes of the 0 to 127 values are be the same.

When autoboxing happens the compiler generates a call to Integer.valueOf(int). If we look at the code for `valueOf(...) we see:

if (i >= IntegerCache.low && i <= IntegerCache.high)
return IntegerCache.cache[i + (-IntegerCache.low)];
return new Integer(i);

So there is a low and high cache range from which you will get a cached constant object that are generated when the JVM starts up. There are JVM arguments that affect the size of the number caches. So 127 must be within the cache range for your JVM while 128 and above are not.

// both of the 100 values gets auto-boxed to the same Integer object
System.identityHashCode(100) == System.identityHashCode(100)

This is basically equivalent to comparing the hashcodes of the same object:

Integer i = new Integer(100);
System.identityHashCode(i) == System.identityHashCode(i)

Once you are past the initial cached set of Integer values and larger ints are specified, new Integer objects will be created when they are auto-boxed and so the identity hashcodes are (most likely) no longer equal.

// by default, each of these 1000000 values gets auto-boxed to a different object
System.identityHashCode(1000000) != System.identityHashCode(1000000)

This is basically equivalent to instantiating 2 different integers and expecting their hashcodes to be the same:

Integer i1 = new Integer(1000000);
Integer i2 = new Integer(1000000);
System.identityHashCode(i1) != System.identityHashCode(i2)

Note that System.identityHashCode(...) does not return unique values so it is possible (albeit unlikely) that they will generate the same value if when looking at different objects.

Does Object class default implementation of hashcode uses identityhashcode

From what I understand, some Java implementations implement identityHashCode by reserving two bits the an object header to divide objects into three categories:

  1. Those for which identityHashCode() has never been called.

  2. Those for which the first-ever call to identityHashCode() occurred after the most recent time the object moved.

  3. Those for which the first-ever call to identityHashCode() occurred before the most recent time the object moved.

For an objects of the first category, a call to identityHashCode() will return a value related to the object's address and set a flag to change the object into the second category. For an object in the second category, the identityHashCode() will compute the same value from the address as the first call. Any time the GC is going to move an object, it checks whether it is in the second category above. If so, the GC reserves an extra four bytes at the destination which will be used to hold the identity hash value the object had before it moved.

If an object has its hash code taken, the effective size of the object will increase by four bytes the next time the GC has to move it. Most objects never have their identity hash code taken, however, and some of them have their identity hash code taken but are immediately collected. Having an object's first call to identityHashCode() allocate an extra four bytes for an object would be difficult, but allocating an extra four bytes when the object is moved is not a problem. Using the object's address as its hash value between the first call to identityHashCode() and the next time the object moves avoids the need to allocate the storage when the space following the object is in use.

Note that while an identity hash method could "legally" use the bit pattern of the address directly, doing so could cause excessive number of hash duplicates (e.g. the first object created following one GC cycle could easily have the same address as the first object created after another). A simple way to avoid that problem is to have the system either add, xor, or otherwise combine the address with a value which gets changed after every GC cycle. Such an approach would make it easy to spread hash values out over a wider range.

Are the hashcodes returned by the System.identityHashCode method uniquely assigned to each object?

The pigeonhole principle applies.

If you have 4 pigeon holes, and 5 pigeons which must all find a pigeon hole to roost in, then at least one pigeon hole is going to have more than one pigeon in it.

Obvious, right?

Same applies here. There are only 2^32 different pigeon holes hash codes (because the value is an int, int in java is defined as a 32-bit number, thus, only 2^32 different possible values exit). That is a big, big number. about 4 billion.

However, there is nothing in the java spec that decrees that no more than 4 billion pigeons objects can ever exist. If ever more than 4 billion objects exist, no algorithm one could possibly design could ever promise uniqueness, because of this principle. QED.

NB: You can also use the pigeonhole principle to prove that a universal compressor (a tool that can compress anything, guaranteeing that the compressed result is always smaller or equal) cannot exist, as long as it actually compresses anything, then there must as a consequence be some stream of bits for which the compressor actually produces a larger file. You can use it to prove that (int) (Math.random() * 10) is not quite uniform random, and why you should therefore use random.nextInt(10) instead (which is). It's a surprisingly useful principle to prove things in computer science!

Now, one could imagine an int based coding system which promises unique codes until you hit 4 billion unique objects, but making such a promise is incredibly complicated and itself a memory hog, if it would have to work for any and all objects.

Java makes no such promises, and System.iHC is as a consequence not guaranteed to have either unique numbers (completely impossible to make that promise) nor that System.iHC has 'perfect' distribution (hashcodes as distributed as they could be, i.e. no reuse until 4 billion objects exist simultaneously). Note that 'exists' is already complicated: When does an object truly 'disappear'?

In practice, iHC is based on memory positions; it's distribution is very very good. But there is a difference between it is highly unlikely any 2 objects would ever have the same identity hashcode and we guarantee no two objects share an identity hash code.

How do JVMs implement IdentityHashMap?

Hope this question helps: How does the JVM ensure that System.identityHashCode() will never change?

Also http://xiao-feng.blogspot.com/2007/04/object-hashcode-implementation.html

Does default hashcode change on object mutation

The contract for hashcode states:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.

So there is no guarantee that the hashcode won't change if you mutate your object. It happens that some JDK implementations might use some sort of internal address but they don't have to so you should not rely on that.

System.identityHashCode() can the same hashCode be returned after an Object is GC'ed

Is it possible that a new created object2 can have the same identity hash code as object1 got before it was GC'ed ?

Yes it is.

The identity hashcode might be derived from the address of the object when the method is first called for the object. (Or it might be generated some other way. The spec allows a lot of different mechanisms to be used.)

So, if the GC collects object1, and a new object object2 is allocated at the same address as the original one, and the new one may have the same hashcode as the original.

Furthermore, if the GC moves object1 after the hashcode has been generated, the new object (object2) at object1's original. Then, you could then end up with two existing objects having the same hashcode.

But neither of these things should be a problem. Hashcodes are not not designed to be identifiers for objects. (So don't try and use them as such.)



My understanding of identity is that objects are unique inside a JVM at a given point in time.

The identity is unique. The identity hashcodes are not. As the Object javadoc says:

"As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects."

That is nowhere near a guarantee of uniqueness.

And then:

"(This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)"

What it is referring to is the address of the object when hashCode() is first called. The contract for the hashCode() method states that the hashcode value cannot change. The identity hashcode is ... in effect ... remembered as part of the object so that the object can be moved by the GC.

And note that the latest javadocs have dropped all mention of how an identity hashcode is or might be generated. This change happened in Java 12.

HashMap order changes when using Thread but is constant without Thread

It's because HashMap order internally will depend on hashcode implementation.

Your Book class does not implement hashCode so it will use the default one

As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct objects. (This
is typically implemented by converting the internal address of the
object into an integer, but this implementation technique is not
required by the JavaTM programming language.)

That means it will use memory address.

In your case, it happens that for a single thread, allocated memory addresses are the same on re-run, which is not the case in threaded version.

But this is only 'by accident' and you cannot rely on it even in single threaded (someone else will run it and get a different result,
or even when you run it later you can get a different result as objects will have different memory addresses)

Please ALWAYS overwrite hashCode (&equals) when using objects in hashmap.

Will .hashcode() return a different int due to compaction of tenure space?

@erickson is more or less correct. The hashcode returned by java.lang.Object.hashCode() does not change for the lifetime of the object.

The way this is (typically) implemented is rather clever. When an object is relocated by the garbage collector, its original hashcode has to be stored somewhere in case it is used again. The obvious way to implement this would be to add a 32 bit field to the object header to hold the hashcode. But that would add a 1 word overhead to every object, and would waste space in the most common case ... where an Object's hashCode method is not called.

The solution is to add two flag bits to the object's flag word, and use them (roughly) as follows. The first flag is set when the hashCode method is called. A second flag tells the hashCode method whether to use the object's current address as the hashcode, or to use a stored value. When the GC runs and relocates an object, it tests these flags. If the first flag is set and second one is unset, the GC allocates one extra word at the end of the object and stores the original object location in that word. Then it sets the two flags. From then on, the hashCode method gets the hashcode value from the word at the end of the object.


In fact, an identityHashCode implementation has to behave this way to satisfy the following part of the general hashCode contract:

"Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application."

A hypothetical implementation of identityHashCode() that simply returned the current machine address of an object would violate the highlighted part if/when the GC moved the object to a different address. The only way around this would be for the (hypothetical) JVM to guarantee that an object never moves once hashCode has been called on it. And that would lead to serious and intractable problems with heap fragmentation.



Related Topics



Leave a reply



Submit