How does HashSet compare elements for equality?
It uses an IEqualityComparer<T>
(EqualityComparer<T>.Default
unless you specify a different one on construction).
When you add an element to the set, it will find the hash code using IEqualityComparer<T>.GetHashCode
, and store both the hash code and the element (after checking whether the element is already in the set, of course).
To look an element up, it will first use the IEqualityComparer<T>.GetHashCode
to find the hash code, then for all elements with the same hash code, it will use IEqualityComparer<T>.Equals
to compare for actual equality.
That means you have two options:
- Pass a custom
IEqualityComparer<T>
into the constructor. This is the best option if you can't modify theT
itself, or if you want a non-default equality relation (e.g. "all users with a negative user ID are considered equal"). This is almost never implemented on the type itself (i.e.Foo
doesn't implementIEqualityComparer<Foo>
) but in a separate type which is only used for comparisons. - Implement equality in the type itself, by overriding
GetHashCode
andEquals(object)
. Ideally, implementIEquatable<T>
in the type as well, particularly if it's a value type. These methods will be called by the default equality comparer.
Note how none of this is in terms of an ordered comparison - which makes sense, as there are certainly situations where you can easily specify equality but not a total ordering. This is all the same as Dictionary<TKey, TValue>
, basically.
If you want a set which uses ordering instead of just equality comparisons, you should use SortedSet<T>
from .NET 4 - which allows you to specify an IComparer<T>
instead of an IEqualityComparer<T>
. This will use IComparer<T>.Compare
- which will delegate to IComparable<T>.CompareTo
or IComparable.CompareTo
if you're using Comparer<T>.Default
.
Checking equality with a HashSet of objects
You need to create a comparer for the sets:
var setComparer = HashSet<Location>.CreateSetComparer();
return other.Variable.Equals(this.Variable) && setComparer.Equals(this.LocationList, other.LocationList);
Comparing objects in HashSet
Overriding the hashCode()
method alone in your Star
class will not work you will have to override the equals()
method.
See the following code where we don't override the equals()
method:
class Star {
int x, y;
public Star(int x, int y) {
this.x = x;
this.y = y;
}
@Override
public int hashCode() {
return Objects.hash(x, y);
}
}
public class Main {
public static void main(String[] args) {
Star s1 = new Star(0, 0);
Star s3 = new Star(0, 0);
Star s2 = new Star(31, -31*31);
Set<Star> set = new HashSet<>();
set.add(s1);
set.add(s2);
System.out.println(set.size());
}
}
This will print 3 (instead of 2 what you might expect).
The reason for this is that the add method of java.util.Set
compares 2 objects based on equals()
method and not based on hashCode()
method.
In the above code for the Star
class, if you add the equals()
method the output will be 2 now. For your reference the you can override the equals()
method as follows:
@Override
public boolean equals(Object startObject) {
if (this == startObject) return true;
if (startObject == null || getClass() != startObject.getClass()) return false;
Star star = (Star) startObject;
return x == star.x &&
y == star.y;
}
So why do you need to add hashCode()
?
- As you are using
HashSet
the add method behind the scene will call theequals()
method and will also callhashCode()
to decide the bucket in which the new object should be put. To maintain the contract ofhashCode()
andequals()
Both should be overridden. - When ever you override
equals()
, it is recommended to overridehashCode()
also. (Vice-versa is also true). See this link for details.
Contract for hashCode()
and equals()
: If for two objects say o1
and o2
, o1.equals(o2)
is true
then hash of o1
and o2
should
be same.
Make sure that you understand this properly, from the above statement it is not implied that if the hash of 2 objects are same, then o1.equals(o2)
should return true
. It can be possible that for 2 objects, their hash is same but the o1.equals(o2)
returns false
See here, what Object
's hashCode()
method guarantees.
See this link to get more detailed information about this topic.
How do I compare two HashSets on one property that both share?
You could just hash the names from the second array to use in a Linq filter to create the final HashSet
var excludeName = new HashSet<string>(people2.Select(x => x.Name));
var hash3 = new HasSet<Person>(people1.Where(x => !exludeName.Contains(x.Name));
This can be especially useful if that list of values to exclude is very large as it will make the entire process run in linear time.
Or here's how you can set up the HashSet
s with IEqualityComparer<T>
.
public class PersonByNameComparer : IEqualityComparer<Peron>
{
public bool Equals(Person p1, Persion p2)
{
return p1.Name == p2.Name;
}
public int GetHashCode(Person p)
{
return p.Name.GetHashCode();
}
}
Note: This means that the HashSet
s cannot contain two items with the same Name
even if the Id
is different. But it also means it cannot contain to different objects with the same values like your current setup.
And then uses it like this.
var comparer = new PersonByNameComparer();
var hash1 = new HashSet<Person>(people1, comparer);
var hash2 = new HashSet<Person>(people2, comparer);
// Note that ExceptWith will mutate the existing hash.
hash1.ExceptWith(hash2);
// Alternatively you can create the new hash like this
var hash3 = new HashSet<Persion>(hash1.Where(p => !hash2.Contains(p)));
How do you determine if two HashSets are equal (by value, not by reference)?
Look at the method SetEquals.
my_hashset.SetEquals(other);
Why does HashSet allow equal items if hashcodes are different?
I think what you're really trying to ask is:
"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"
The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.
I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.
The equals()
documentations states (emphasis added):
Note that it is generally necessary to override the
hashCode
method whenever this method is overridden, so as to maintain the general contract for thehashCode
method, which states that equal objects must have equal hash codes.
The contract between equals()
and hashCode()
isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b)
implies a.hashCode() == b.hashCode()
we can do some basic equivalence tests without needing to call equals()
directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode()
implies a.equals(b)
will be false.
If you look at the code for HashMap
(which HashSet
uses internally), you'll notice an inner static class Entry
, defined like so:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
...
}
HashMap
stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map
's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap
to cache this value. By doing so, it only needs to call hashCode()
once for each key in the map, as opposed to every time the key is inspected.
Now lets look at the implementation of put()
, where we see these cached hashes being taken advantage of, along with the invariant above:
public V put(K key, V value) {
...
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
// Replace existing element and return
}
}
// Insert new element
}
In particular, notice that the conditional only ever calls key.equals(k)
if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap
to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap
are no longer true, and you will get back unusable results, including "duplicates" in your set.
Note that your claim "HashSet ... has an add(Object o)
method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet
, does not implement this method, the parent interface, Set
, does specify the method's contract. The Set
interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2))
. As long as you follow the contracts, HashSet
works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet
cannot be expected to behave in any useful way.
Consider also that if you attempted to store objects in a TreeSet
with an incorrectly implemented Comparator
, you would similarly see nonsensical results. I documented some examples of how a TreeSet
behaves when using an untrustworthy Comparator
in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?
Why does not Java HashSet.equals() check for object equality?
HashSet
includes a number of optimizations that can explain all of this: first, if two objects are put into different buckets by their hash codes, or if they have different hash codes at all, they may skip the equals
call. This is allowed by the contract of Object.hashCode
; if two objects have different hash codes then they are not allowed to .equals
to each other.
For the other case, HashSet
takes advantage of the contract of .equals
that specifies that if two objects are ==
to each other, then they must be .equals
to each other. Part of HashSet
's implementation here checks if the elements are ==
, and if they are, it skips calling .equals
.
If every method implements its contract correctly, this cannot change the semantics; HashSet
will always behave exactly as if .equals
were being called.
Set of sets not checking for equality
I have been pointed to this question. Although the accepted answer is helpful for checking equality of sets by hand (by means of HashSet<T>.SetEquals(HashSet<T>)
), it doesn't quite help with applying this equality logic to a set of sets.
However the non-accepted answer (by Gregory Adam) gives the crucial hint as to how this can be accomplished, namely with HashSet<string>.CreateSetComparer()
. Because HashSet has a constructor that accepts an equality comparer, this is the way to go:
HashSet<HashSet<string>> setOfSets =
new HashSet<HashSet<string>>(HashSet<string>.CreateSetComparer());
This tells the outer HashSet how to "properly" (in the mathematical sense) compare objects the type of the inner HashSet (in my case HashSet<string>
).
As Hans Passant and Eric Lippert have kindly pointed out, equality comparison of HashSets is comparatively expensive, even more so if it is applied to a nested HashSet, which might be one of the reasons why this hasn't been chosen as the default behavior.
The main reason however, according to Eric Lippert, for choosing reference equality is the mutability of the HashSet objects. Applied to my example: if I add two set-wise different HashSets (that is !set1.SetEquals(set2)
) to my setOfSets and afterwards change set1's contents to become equal to set2, there are still two sets in setOfSets, although there should be only one set then. So this has led to a state which is inconsistent with the original requirement ("there shall not be two equal objects in a set").
The same cannot be done with strings because strings are immutable. If I add two different string objects "Foo" and "Bar" to a HashSet, there is no legal way to change them to become equal.
Related Topics
How to Get My C# Program to Sleep for 50 Msec
Environment.Tickcount VS Datetime.Now
Grid Lines Are Not Displaying in Grid View
How to Install Android APK from Code in Unity
Printing to a Client Printer from a Web App
Is There an Equivalent to the Scanner Class in C# for Strings
How to Indefinitely Pause a Thread
How to Put a Task to Sleep (Or Delay) in C# 4.0
Writing Recursive Cte Using Entity Framework Fluent Syntax or Inline Syntax
Why Do Bcl Collections Use Struct Enumerators, Not Classes
How to Style Chat Window Using CSS When Using Microsoft Bot Framework
How to Manage Files on an Mtp Portable Device
C# Webbrowser Control -- Get Document Elements After Ajax
How to Read a .Net Guid into a Java Uuid