Understanding the Workings of Equals and Hashcode in a Hashmap

How does a Java HashMap handle different objects with the same hash code?

A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):

It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).

When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().

Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.

Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:

  • If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).

  • If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.

When do you need to override hashcode() and equals() when using a hashmap

The answer is yes.

In Java you can add objects in collections. Let us say you wanted to find an object called A that you added to a list called L. Let us say this is an object that you defined with your own class and you override the method Object#equals(). When you are looping through list L you are testing if any of these objects are equal to object A. If the equals method returns true you have found your object.

When you add objects to any HashTable, HashMap or HashSet the hashcode method is used to generate a number. That number should as unique as possible. It is possible that objects of the same class have different values in their instance fields but their hashcode method produces the same value. If you have two objects X and Y of some class and they have the same hashcode and you put both of them in a HashMap Q they end up in the same bucket P. Let us say that P has two objects. Let say that you pass Q into a method with X and Y. The method wants to check if X exists in Q. Q will take X and get the hashcode. Q will use the hashcode to find a bucket. The bucket will be P. Bucket P has two objects. This is when the equals method is used to determine if the Bucket contains X by comparing each object in the bucket with X. If one of the object in the bucket matches X then X exists in Q.

How HashMap retrieves different values if Key's hashcode is same but equals method return false

A HashMap/HashSet implements per bucket a list of keys (on the case of a Map together with the values). If multiple key's share the same hashCode value, they are placed in this list.

So the search method first extracts the hashCode of the queried key and then iterates over the corresponding list until an equals method succeeds. In case of a HashSet it means the key is found, in case of a HashMap, it returns the other side of the tuple: the value.

The memory of a HashMap thus works like:

+--------+--------+--------+--------+
| 00 | 01 | 10 | 11 |
+--------+--------+--------+--------+
| | | |
k00/v00 _ k06/v06 _
| |
k08/v08 k14/v14
| |
k04/v04 _
|
_

What you see is on top the four buckets. Each bucket has a list (the items underneath), that stores tuples of keys (k) and values (v). Since there are here only four buckets, the hash algorithm uses a modulo 4 operation, so a key k06 with value v06 would be placed in bucket 06 mod 4 = 02 thus 10. In case a second key k14 is added with 14 mod 4 = 02 thus 10, it is simply added to the list.

Since the values are stored with it as well, one can perform a fast lookup operation. The key is thus stored together with the value.

You noticed, that iterating over the (linked) list is an expensive operation. But the point of a HashMap is that one hopes, that the number of hash-collisions to use the correct term (number of keys sharing the same bucket) is very low. In general one might expect two or three elements per bucket. The performance boost is thus achieved by selecting the correct bucket in constant time, searching the bucket requires however linear time (or in case there is a complete ordering on the key's, one might implement a (balanced) binary tree to search in logarithmic time). Worst-case however, a HashMap can achieve, the same performance as an ArrayList/LinkedList of entries, but given the hash-function has been designed decently, the odds are very low.

How HashMap really works?

The implementation is open source, so I would encourage you to just read the code for any specific questions. But here's the general idea:

  • The primary responsibility for good hashCode distribution lies with the keys' class, not with the HashMap. If the key have a hashCode() method with bad distribution (for instance, return 0;) then the HashMap will perform badly.
  • HashMap does do a bit of "re-hashing" to ensure slightly better distribution, but not much (see HashMap::hash)
  • On the get side of things, a couple checks are made on each element in the bucket (which, yes, is implemented as a linked list)

    • First, the HashMap checks the element's hashCode with the incoming key's hashCode. This is because this operation is quick, and the element's hashCode was cached at put time. This guards against elements that have different hashCodes (and are thus unequal, by the contract of hashCode and equals established by Object) but happen to fall into the same bucket (remember, bucket indexes are basically hashCode % buckets.length)
    • If that succeeds, then, the HashMap checks equals explicitly to ensure they're really equal. Remember that equality implies same hashCode, but same hashCode does not require equality (and can't, since some classes have potentially infinite number of different values -- like String -- but there are only a finite number of possible hashCode values)

The reason for the double-checking of both hashCode and equals is to be both fast and correct. Consider two keys that have a different hashCode, but end up in the same HashMap bucket. For instance, if key A has hashCode=7 and B has hashCode=14, and there are 7 buckets, then they'll both end up in bucket 0 (7 % 7 == 0, and 14 % 7 == 0). Checking the hashCodes there is a quick way of seeing that A and B are unequal. If you find that the hashCodes are equal, then you make sure it's not just a hashCode collision by calling equals. This is just an optimization, really; it's not required by the general hash map algorithm.

HashMap with incorrect equals and HashCode implementation

HashMap has a little trick where it compares object references before using equals. Since you're using the same object references for adding the elements and for retrieving them, HashMap will return them correctly.

See Java 7 source here (Java 8 did a pretty big revamp of HashMap but it does something similar)

final Entry<K,V> getEntry(Object key) {
if (size == 0) {
return null;
}

int hash = (key == null) ? 0 : hash(key);
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
// HERE. Uses == with the key
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
}
return null;
}

Note that this isn't part of the docs, so don't depend on it.

Why do I need to override the equals and hashCode methods in Java?

Joshua Bloch says on Effective Java

You must override hashCode() in every class that overrides equals(). Failure to do so will result in a violation of the general contract for Object.hashCode(), which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.

Let's try to understand it with an example of what would happen if we override equals() without overriding hashCode() and attempt to use a Map.

Say we have a class like this and that two objects of MyClass are equal if their importantField is equal (with hashCode() and equals() generated by eclipse)

public class MyClass {
private final String importantField;
private final String anotherField;

public MyClass(final String equalField, final String anotherField) {
this.importantField = equalField;
this.anotherField = anotherField;
}

@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((importantField == null) ? 0 : importantField.hashCode());
return result;
}

@Override
public boolean equals(final Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
final MyClass other = (MyClass) obj;
if (importantField == null) {
if (other.importantField != null)
return false;
} else if (!importantField.equals(other.importantField))
return false;
return true;
}
}

Imagine you have this

MyClass first = new MyClass("a","first");
MyClass second = new MyClass("a","second");

Override only equals

If only equals is overriden, then when you call myMap.put(first,someValue) first will hash to some bucket and when you call myMap.put(second,someOtherValue) it will hash to some other bucket (as they have a different hashCode). So, although they are equal, as they don't hash to the same bucket, the map can't realize it and both of them stay in the map.


Although it is not necessary to override equals() if we override hashCode(), let's see what would happen in this particular case where we know that two objects of MyClass are equal if their importantField is equal but we do not override equals().

Override only hashCode

If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(second,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to the business requirement).

But the problem is that equals was not redefined, so when the map hashes second and iterates through the bucket looking if there is an object k such that second.equals(k) is true it won't find any as second.equals(first) will be false.

Hope it was clear

Adding objects to hashmap as keys which only equals method implemented

You must override hashCode as well. if a.equals(b), a.hashCode() must be equal to b.hashCode(). The implementation of hashCode determines in which bucket the entry will be stored. If you don't override hashCode, you might have two equal keys stored in two different buckets.

Add this to your S class :

@Override
public int hashCode() {
if (txt == null)
return 0;
return txt.hashCode();
}


Related Topics



Leave a reply



Submit