Hashcode and Equals for Hashset

Hashcode and Equals for Hashset

  1. There's no need to call equals if hashCode differs.
  2. There's no need to call hashCode if (obj1 == obj2).
  3. There's no need for hashCode and/or equals just to iterate - you're not comparing objects
  4. When needed to distinguish in between objects.

How to override equals(), hashcode() and compareTo() for a HashSet

The basic question here is "How can you determine if two objects are equal to each other?"

This is a simple question for simple objects. However, it becomes increasingly difficult with even slightly more complex objects.

As stated in the original question:

The only unique thing about the object is the the map myMap which gets populated later in the lifecycle.

Given two instances of the type MyObject, the member variables myMap must be compared with each other. This map is of type Map<String, String>. A few questions immediately come to mind:

  • How do the keys & values define equality?

    • (does a key=value pair need to be compared as a unit?)
    • (or should only the values be compared to each other?)
  • How does the order of the keys in the map affect equality?

    • (should keys in the list be sorted, so that A-B-C is equivalent to B-C-A?)
    • (or does 1-2-3 mean something different than 3-2-1?)
  • Does upper/lower case make any different to the equality of the values?
  • Will these objects ever be stored in some kind of Java HashSet or Java TreeSet?

    • (do you need to store the same object several times in the same collection?)
    • (or should objects with equal hashcodes only be stored once?)
  • Will these objects ever require sorting as part of a list or Java Collection?
  • How should the comparison function arrange non-equal objects in a list?

    • (how should key order determine if an object will come earlier or later in a list?)
    • (how should values determine order, especially if several values are different?)

Answers to each of these questions will vary between applications. In order to keep this applicable to a general audience, the following assumptions are being made:

  • To maintain a deterministic comparison, keys will be sorted
  • Values will be considered to be case-sensitive
  • Keys and values are inseparable, and will be compared as a unit
  • The Map will be flattened into a single String, so results can be compared easily

The beauty of using equals(), hashCode(), and compareTo() is that once hashCode() is implemented properly, the other functions can be defined based on hashCode().

Considering all of that, we have the following implementation:

@Override
public boolean equals(final Object o)
{
if (o instanceof MyObject)
{
return (0 == this.compareTo(((MyObject) o)));
}
return false;
}

@Override
public int hashCode()
{
return getKeyValuePairs(this.myMap).hashCode();
}

// Return a negative integer, zero, or a positive integer
// if this object is less than, equal to, or greater than the other object
public int compareTo(final MyObject o)
{
return this.hashCode() - o.hashCode();
}

// The Map is flattened into a single String for comparison
private static String getKeyValuePairs(final Map<String, String> m)
{
final StringBuilder kvPairs = new StringBuilder();

final String kvSeparator = "=";
final String liSeparator = "^";

if (null != m)
{
final List<String> keys = new ArrayList<>(m.keySet());
Collections.sort(keys);

for (final String key : keys)
{
final String value = m.get(key);
kvPairs.append(liSeparator);
kvPairs.append(key);
kvPairs.append(kvSeparator);
kvPairs.append(null == value ? "" : value);
}
}

return 0 == kvPairs.length() ? "" : kvPairs.substring(liSeparator.length());
}

All the critical work is being done inside of hashCode(). For sorting, the compareTo() function only needs to return a negative/zero/positive number -- a simple hashCode() diff. And the equals() function only needs to return true/false -- a simple check that compareTo() equals zero.


For further reading, there is a famous dialogue by Lewis Carroll on the foundations of logic, which touches on the basic question of equality:

https://en.wikipedia.org/wiki/What_the_Tortoise_Said_to_Achilles

And, in regard to even simple grammatical constructs, there is a fine example of two "equal" sentences at the start of chapter 6, "Pig and Pepper", from Alice in Wonderland:

The Fish-Footman began by producing from under his arm a great letter, and this he handed over to the other, saying, in a solemn tone, "For the Duchess. An invitation from the Queen to play croquet." The Frog-Footman repeated, in the same solemn tone, "From the Queen. An invitation for the Duchess to play croquet." Then they both bowed low and their curls got entangled together.

Why does HashSet allow equal items if hashcodes are different?

I think what you're really trying to ask is:

"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.

I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.


The equals() documentations states (emphasis added):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:

static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
...
}

HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.

Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:

public V put(K key, V value) {
...
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
// Replace existing element and return
}
}
// Insert new element
}

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.


Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?

Why is equals() not called while adding to HashSet and hashCode matches?

The hash set checks reference equality first, and if that passes, it skips the .equals call. This is an optimization and works because the contract of equals specifies that if a == b then a.equals(b).

I attached the source code below, with this check highlighted.

If you instead add two equal elements that are not the same reference, you get the effect you were expecting:

    HashSet st1=new HashSet();
st1.add(new Student(89));
st1.add(new Student(89));
System.out.println("Ho size="+st1.size());

results in

$ java Test1
Hello-hashcode
Hello-hashcode
Hello-equals
Ho size=1

Here's the source code from OpenJDK 7, with equality optimization indicated (from HashMap, the underlying implementation of HashSet):

public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
// v-- HERE
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}

modCount++;
addEntry(hash, key, value, i);
return null;
}

hashcode() and equals() method

Equals is always called after the hashCode method in a java hashed collection while adding and removing elements. The reason being, if there is an element already at the specified bucket, then JVM checks whether it is the same element which it is trying to put. In case if the equals returns false then the element is added to the same bucket but at the end of list at the bucket. So now you just dont have a single element at the same bucket but a list of elements.

Now while retrieving the element, first hashCode will be called to reach the desired bucket and then the list will be scanned using the equals to fetch the desired element.

The ideal implemenation of hashCode will make sure the size of list at each bucket is 1. And hence the retrieval of elements is done using O(1) complexity. But if there are mulitple elements stored in the list at a bucket, then the retreival of element will be done by O(n) complexiy, where n is the size of the list.

Btw in case of HashSet there is no list created at the bucket, rather the object is simply replaced if hashcode and equals are same. The ist creation behavior is in hashmap.

Overriding hashCode when using HashMap, HashSet etc?

Such hashCode would work. You'll also have to override equals in a consistent manner:

@Override
public boolean equals(Object other)
{
if (!(other instanceof Player))
return false;
Player op = (Player) other;
return id == op.id;
}

Why do I need to override the equals and hashCode methods in Java?

Joshua Bloch says on Effective Java

You must override hashCode() in every class that overrides equals(). Failure to do so will result in a violation of the general contract for Object.hashCode(), which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.

Let's try to understand it with an example of what would happen if we override equals() without overriding hashCode() and attempt to use a Map.

Say we have a class like this and that two objects of MyClass are equal if their importantField is equal (with hashCode() and equals() generated by eclipse)

public class MyClass {
private final String importantField;
private final String anotherField;

public MyClass(final String equalField, final String anotherField) {
this.importantField = equalField;
this.anotherField = anotherField;
}

@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((importantField == null) ? 0 : importantField.hashCode());
return result;
}

@Override
public boolean equals(final Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
final MyClass other = (MyClass) obj;
if (importantField == null) {
if (other.importantField != null)
return false;
} else if (!importantField.equals(other.importantField))
return false;
return true;
}
}

Imagine you have this

MyClass first = new MyClass("a","first");
MyClass second = new MyClass("a","second");

Override only equals

If only equals is overriden, then when you call myMap.put(first,someValue) first will hash to some bucket and when you call myMap.put(second,someOtherValue) it will hash to some other bucket (as they have a different hashCode). So, although they are equal, as they don't hash to the same bucket, the map can't realize it and both of them stay in the map.


Although it is not necessary to override equals() if we override hashCode(), let's see what would happen in this particular case where we know that two objects of MyClass are equal if their importantField is equal but we do not override equals().

Override only hashCode

If you only override hashCode then when you call myMap.put(first,someValue) it takes first, calculates its hashCode and stores it in a given bucket. Then when you call myMap.put(second,someOtherValue) it should replace first with second as per the Map Documentation because they are equal (according to the business requirement).

But the problem is that equals was not redefined, so when the map hashes second and iterates through the bucket looking if there is an object k such that second.equals(k) is true it won't find any as second.equals(first) will be false.

Hope it was clear

Scala - Behaviour of hashCode & equals when using mutable HashSet

About question #2:

When two objects have the same hashCode, the equals method will be called to settle the 'tie'. Because in this case the hash is different, equals will not be called

About question #1:

In reality what is happening is that when the HashSet is defined, the hash method for bear1 and bear2 are called, because they evaluate to the same value (namely 100), the equals will also be called.

The extra call to the hashCode method is because of the println(setBears). It shows the hash for each object in the hashset.

So, when creating the HashSet, it adds up to two hashCode calls (one for each bear) and one equals call to settle the tie



Related Topics



Leave a reply



Submit