How does a Java HashMap handle different objects with the same hash code?
A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):
It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).
When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals()
.
Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.
Looking at the above mechanism, you can also see what requirements are necessary on the hashCode()
and equals()
methods of keys:
If two keys are the same (
equals()
returnstrue
when you compare them), theirhashCode()
method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use
equals()
to tell them apart.
When do you need to override hashcode() and equals() when using a hashmap
The answer is yes.
In Java you can add objects in collections. Let us say you wanted to find an object called A that you added to a list called L. Let us say this is an object that you defined with your own class and you override the method Object#equals(). When you are looping through list L you are testing if any of these objects are equal to object A. If the equals method returns true you have found your object.
When you add objects to any HashTable, HashMap or HashSet the hashcode method is used to generate a number. That number should as unique as possible. It is possible that objects of the same class have different values in their instance fields but their hashcode method produces the same value. If you have two objects X and Y of some class and they have the same hashcode and you put both of them in a HashMap Q they end up in the same bucket P. Let us say that P has two objects. Let say that you pass Q into a method with X and Y. The method wants to check if X exists in Q. Q will take X and get the hashcode. Q will use the hashcode to find a bucket. The bucket will be P. Bucket P has two objects. This is when the equals method is used to determine if the Bucket contains X by comparing each object in the bucket with X. If one of the object in the bucket matches X then X exists in Q.
How HashMap retrieves different values if Key's hashcode is same but equals method return false
A HashMap
/HashSet
implements per bucket a list of keys (on the case of a Map
together with the values). If multiple key's share the same hashCode
value, they are placed in this list.
So the search method first extracts the hashCode
of the queried key and then iterates over the corresponding list until an equals
method succeeds. In case of a HashSet
it means the key
is found, in case of a HashMap
, it returns the other side of the tuple: the value.
The memory of a HashMap
thus works like:
+--------+--------+--------+--------+
| 00 | 01 | 10 | 11 |
+--------+--------+--------+--------+
| | | |
k00/v00 _ k06/v06 _
| |
k08/v08 k14/v14
| |
k04/v04 _
|
_
What you see is on top the four buckets. Each bucket has a list (the items underneath), that stores tuples of keys (k
) and values (v
). Since there are here only four buckets, the hash algorithm uses a modulo 4 operation, so a key k06
with value v06
would be placed in bucket 06 mod 4 = 02
thus 10
. In case a second key k14
is added with 14 mod 4 = 02
thus 10
, it is simply added to the list.
Since the values are stored with it as well, one can perform a fast lookup operation. The key is thus stored together with the value.
You noticed, that iterating over the (linked) list is an expensive operation. But the point of a HashMap
is that one hopes, that the number of hash-collisions to use the correct term (number of keys sharing the same bucket) is very low. In general one might expect two or three elements per bucket. The performance boost is thus achieved by selecting the correct bucket in constant time, searching the bucket requires however linear time (or in case there is a complete ordering on the key's, one might implement a (balanced) binary tree to search in logarithmic time). Worst-case however, a HashMap
can achieve, the same performance as an ArrayList
/LinkedList
of entries, but given the hash-function has been designed decently, the odds are very low.
How HashMap really works?
The implementation is open source, so I would encourage you to just read the code for any specific questions. But here's the general idea:
- The primary responsibility for good hashCode distribution lies with the keys' class, not with the HashMap. If the key have a
hashCode()
method with bad distribution (for instance,return 0;
) then the HashMap will perform badly. - HashMap does do a bit of "re-hashing" to ensure slightly better distribution, but not much (see
HashMap::hash
) - On the
get
side of things, a couple checks are made on each element in the bucket (which, yes, is implemented as a linked list)- First, the HashMap checks the element's hashCode with the incoming key's hashCode. This is because this operation is quick, and the element's hashCode was cached at
put
time. This guards against elements that have different hashCodes (and are thus unequal, by the contract of hashCode and equals established byObject
) but happen to fall into the same bucket (remember, bucket indexes are basicallyhashCode % buckets.length
) - If that succeeds, then, the HashMap checks
equals
explicitly to ensure they're really equal. Remember that equality implies same hashCode, but same hashCode does not require equality (and can't, since some classes have potentially infinite number of different values -- likeString
-- but there are only a finite number of possible hashCode values)
- First, the HashMap checks the element's hashCode with the incoming key's hashCode. This is because this operation is quick, and the element's hashCode was cached at
The reason for the double-checking of both hashCode and equals is to be both fast and correct. Consider two keys that have a different hashCode, but end up in the same HashMap bucket. For instance, if key A has hashCode=7 and B has hashCode=14, and there are 7 buckets, then they'll both end up in bucket 0 (7 % 7 == 0
, and 14 % 7 == 0
). Checking the hashCodes there is a quick way of seeing that A and B are unequal. If you find that the hashCodes are equal, then you make sure it's not just a hashCode collision by calling equals
. This is just an optimization, really; it's not required by the general hash map algorithm.
HashMap with incorrect equals and HashCode implementation
HashMap
has a little trick where it compares object references before using equals
. Since you're using the same object references for adding the elements and for retrieving them, HashMap
will return them correctly.
See Java 7 source here (Java 8 did a pretty big revamp of HashMap
but it does something similar)
final Entry<K,V> getEntry(Object key) {
if (size == 0) {
return null;
}
int hash = (key == null) ? 0 : hash(key);
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
// HERE. Uses == with the key
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}
Note that this isn't part of the docs, so don't depend on it.
Why do I need to override the equals and hashCode methods in Java?
Joshua Bloch says on Effective Java
You must override hashCode() in every class that overrides equals(). Failure to do so will result in a violation of the general contract for Object.hashCode(), which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.
Let's try to understand it with an example of what would happen if we override equals()
without overriding hashCode()
and attempt to use a Map
.
Say we have a class like this and that two objects of MyClass
are equal if their importantField
is equal (with hashCode()
and equals()
generated by eclipse)
public class MyClass {
private final String importantField;
private final String anotherField;
public MyClass(final String equalField, final String anotherField) {
this.importantField = equalField;
this.anotherField = anotherField;
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((importantField == null) ? 0 : importantField.hashCode());
return result;
}
@Override
public boolean equals(final Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
final MyClass other = (MyClass) obj;
if (importantField == null) {
if (other.importantField != null)
return false;
} else if (!importantField.equals(other.importantField))
return false;
return true;
}
}
Imagine you have this
MyClass first = new MyClass("a","first");
MyClass second = new MyClass("a","second");
Override only equals
If only equals
is overriden, then when you call myMap.put(first,someValue)
first will hash to some bucket and when you call myMap.put(second,someOtherValue)
it will hash to some other bucket (as they have a different hashCode
). So, although they are equal, as they don't hash to the same bucket, the map can't realize it and both of them stay in the map.
Although it is not necessary to override equals()
if we override hashCode()
, let's see what would happen in this particular case where we know that two objects of MyClass
are equal if their importantField
is equal but we do not override equals()
.
Override only hashCode
If you only override hashCode
then when you call myMap.put(first,someValue)
it takes first, calculates its hashCode
and stores it in a given bucket. Then when you call myMap.put(second,someOtherValue)
it should replace first with second as per the Map Documentation because they are equal (according to the business requirement).
But the problem is that equals was not redefined, so when the map hashes second
and iterates through the bucket looking if there is an object k
such that second.equals(k)
is true it won't find any as second.equals(first)
will be false
.
Hope it was clear
Adding objects to hashmap as keys which only equals method implemented
You must override hashCode
as well. if a.equals(b)
, a.hashCode()
must be equal to b.hashCode()
. The implementation of hashCode
determines in which bucket the entry will be stored. If you don't override hashCode
, you might have two equal keys stored in two different buckets.
Add this to your S
class :
@Override
public int hashCode() {
if (txt == null)
return 0;
return txt.hashCode();
}
Related Topics
@Requestbody and @Responsebody Annotations in Spring
How to Get the Size of a Java.Sql.Resultset
Controlling Maven Final Name of Jar Artifact
How to Sort List of Objects by Some Property
Why Does Changing the Returned Variable in a Finally Block Not Change the Return Value
How to Load Classes at Runtime from a Folder or Jar
How to Get Screen Resolution in Java
Jackson - Deserialize Using Generic Class
Using "Like" Wildcard in Prepared Statement
Split String on Spaces in Java, Except If Between Quotes (I.E. Treat \"Hello World\" as One Token)
How to Add Javafx Runtime to Eclipse in Java 11
"Integer Number Too Large" Error Message for 600851475143
Converting Symbols, Accent Letters to English Alphabet
Using Heapdumponoutofmemoryerror Parameter for Heap Dump for Jboss
When to Use Abstract Class or Interface
How Can a String Be Initialized Using " "
By Default, Jsf Generates Unusable Ids, Which Are Incompatible with the CSS Part of Web Standards