When Does Hashset 'Add' Method Calls Equals

When does HashSet 'add' method calls equals?

If the hash codes differ, there is no need to call equals() since it is guaranteed to return false.

This follows from the general contract on equals() and hashCode():

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

Right now your class is breaking that contract. You need to fix that.

Why is equals() not called while adding to HashSet and hashCode matches?

The hash set checks reference equality first, and if that passes, it skips the .equals call. This is an optimization and works because the contract of equals specifies that if a == b then a.equals(b).

I attached the source code below, with this check highlighted.

If you instead add two equal elements that are not the same reference, you get the effect you were expecting:

    HashSet st1=new HashSet();
st1.add(new Student(89));
st1.add(new Student(89));
System.out.println("Ho size="+st1.size());

results in

$ java Test1
Hello-hashcode
Hello-hashcode
Hello-equals
Ho size=1

Here's the source code from OpenJDK 7, with equality optimization indicated (from HashMap, the underlying implementation of HashSet):

public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
// v-- HERE
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}

modCount++;
addEntry(hash, key, value, i);
return null;
}

When HashSet call equal method?

Your code will call the equals() of HashTest only once. The other time it calls the equals() method would be the equals() of the String class.

hs.add(h1); // Nothing is called
hs.add(h2); // Calls the equals() method of HashTest, thus the log
hs.add(s1); // Nothing is called
hs.add(s2); // Calls the equals() method of String

This answer explains when the equals() method is called by the HashSet and when it isn't. An excerpt from it:

The HashSet takes advantage of hashcodes to speed things up. It assumes that two objects that equal eachother will have the same hash code. However it does not assume that two objects with the same hash code mean they are equal. This is why when it detects a colliding hash code, it only compares with other objects (in your case one) in the set with the same hash code.

How to make HashSetT invoking T.equals(Object ) when HashSet.add() is called

HashSet uses both equals() and hashCode() method to check if there already exists the element you are inserting. You should also override hashCode() method. Oh, and notice that you're not really overriding equals(Object) method, but overloading it. That's not the same thing. Make sure the parameter type is Object. Also always add @Override annotation over the method that you intend to override.

Why won't my HashSet allow me to add two of the same instance, if their equals() says they're false?

HashSet (really HashMap under the hood) has an "optimization" in that it checks for object reference equality before calling the equals() method. since you are putting the same instance in twice, they are considered equal even though the equals() method does not agree.

Relevant line from HashMap.put():

        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {

Why does HashSet allow equal items if hashcodes are different?

I think what you're really trying to ask is:

"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.

I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.


The equals() documentations states (emphasis added):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:

static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
...
}

HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.

Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:

public V put(K key, V value) {
...
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
// Replace existing element and return
}
}
// Insert new element
}

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.


Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?

Why HashSet sometimes doesn't add object when relying on default hash and equals?

Your SomeService class from your test code will generate in the end the following objects:

new MyClass("id2", "id1");
new MyClass("id3", "id1");

Due to inheritance the following base constructor will be called for these two objects:

new MyHelperClass("id1");
new MyHelperClass("id1");

As you see, you call your base constructor with the same argument. And based on your equals() implementation they are the same (and have the same hash code). Check the following example code to show the issue:

MyClass o1 = new MyClass("id2", "id1");
MyClass o2 = new MyClass("id3", "id1");
System.out.println(o1.hashCode());
System.out.println(o2.hashCode());
System.out.println(o1.equals(o2));

This will generate the following output:

104085
104085
true

And as they are equal only once instance will be saved in your HashSet().



Related Topics



Leave a reply



Submit