What's the Role of Gethashcode in the Iequalitycomparer<T> in .Net

What's the role of GetHashCode in the IEqualityComparerT in .NET?

A bit of background first...

Every object in .NET has an Equals method and a GetHashCode method.

The Equals method is used to compare one object with another object - to see if the two objects are equivalent.

The GetHashCode method generates a 32-bit integer representation of the object. Since there is no limit to how much information an object can contain, certain hash codes are shared by multiple objects - so the hash code is not necessarily unique.

A dictionary is a really cool data structure that trades a higher memory footprint in return for (more or less) constant costs for Add/Remove/Get operations. It is a poor choice for iterating over though. Internally, a dictionary contains an array of buckets, where values can be stored. When you add a Key and Value to a dictionary, the GetHashCode method is called on the Key. The hashcode returned is used to determine the index of the bucket in which the Key/Value pair should be stored.

When you want to access the Value, you pass in the Key again. The GetHashCode method is called on the Key, and the bucket containing the Value is located.

When an IEqualityComparer is passed into the constructor of a dictionary, the IEqualityComparer.Equals and IEqualityComparer.GetHashCode methods are used instead of the methods on the Key objects.

Now to explain why both methods are necessary, consider this example:

BoxEqualityComparer boxEqC = new BoxEqualityComparer(); 

Dictionary<Box, String> boxes = new Dictionary<Box, string>(boxEqC);

Box redBox = new Box(100, 100, 25);
Box blueBox = new Box(1000, 1000, 25);

boxes.Add(redBox, "red");
boxes.Add(blueBox, "blue");

Using the BoxEqualityComparer.GetHashCode method in your example, both of these boxes have the same hashcode - 100^100^25 = 1000^1000^25 = 25 - even though they are clearly not the same object. The reason that they are the same hashcode in this case is because you are using the ^ (bitwise exclusive-OR) operator so 100^100 cancels out leaving zero, as does 1000^1000. When two different objects have the same key, we call that a collision.

When we add two Key/Value pairs with the same hashcode to a dictionary, they are both stored in the same bucket. So when we want to retrieve a Value, the GetHashCode method is called on our Key to locate the bucket. Since there is more than one value in the bucket, the dictionary iterates over all of the Key/Value pairs in the bucket calling the Equals method on the Keys to find the correct one.

In the example that you posted, the two boxes are equivalent, so the Equals method returns true. In this case the dictionary has two identical Keys, so it throws an exception.

TLDR

So in summary, the GetHashCode method is used to generate an address where the object is stored. So a dictionary doesn't have to search for it. It just computes the hashcode and jumps to that location. The Equals method is a better test of equality, but cannot be used to map an object into an address space.

IEqualityComparer GetHashCode being called but Equals not

Your Equals and GetHashCode implementations should involve the exact same set of properties; they do not.

In more formal terms, GetHashCode must always return the same value for two objects that compare equal. With your current code, two objects that differ only in the Ret_USD value will always compare equal but are not guaranteed to have the same hash code.

So what happens is that LINQ calls GetHashCode on two objects you consider equal, gets back different values, concludes that since the values were different the objects cannot be equal so there's no point at all in calling Equals and moves on.

To fix the problem, either remove the Ret_USD factor from GetHashCode or introduce it also inside Equals (whatever makes sense for your semantics of equality).

Why does IEqualityComparerT require GetHashCode?

The reason the IEqualityComparer<T> has been added to the .NET framework is to allow you to customize the logic to hash-based containers in cases when you cannot modify the class with a different override of Equals/GetHashCode (say, it's in someone else's library) or you do not want to change the default implementation for any other reason - for example, for backward compatibility. This is the primary purpose of having this interface: you give it to hash containers, and they use it instead of the Equals/GetHashCode logic that comes with the object.

It appears that the platform designers did not have a use case for an IEqualityComparer<T> interface outside customizing hash containers. It is hard to come up with a scenario when an externally-supplied Equals would be useful by itself, without GetHashCode. The .NET platform already provides a concise mechanism for users who need to externalize a two-variable (or N-variable) checks through Predicate<T1,T2>, so if you want to write code that takes an equality checker from outside, you can do this:

void MyFunction(IEnumerable<T> one, IEnumerable<T> two, Predicate<T,T> equalityCheker) {
foreach (var a in one) {
foreach (var b in two) {
if (equalityCheker(a, b)) {
Console.WriteLine("Equal: {0} {1}", a, b);
}
}
}
}

Using IEqualityComparer GetHashCode with a tolerance

The problem is impossible, conceptually. You're trying to compare objects in a way that doesn't have a form of equality that is necessary for the operations you're trying to perform with it. For example, GroupJoin is dependant on the assumption that if A is equal to B, and B is equal to C, then A is equal to C, but in your situation, that's not true. A and B may be "close enough" together for you to want to group them, but A and C may not be.

You're going to need to not implement IEqualityComparer at all, because you cannot fulfill the contract that it requires. If you want to create a mapping of items in one collection to all of the items in another collection that are "close enough" to it then you're going to need to write that algorithm yourself (doing so efficiently is likely to be hard, but doing so inefficiently isn't shouldn't' be that difficult), rather than using GroupJoin, because it's not capable of performing that operation.

When Implementing IEqualityComparer Should GetHashCode check for null?

ReSharper is wrong.

Obviously code you write can call that particular GetHashCode method and pass in a null value. All known methods might ensure this will never happen, but obviously ReSharper can only take existing code (patterns) into account.

So in this case, check for null and do the "right thing".


Corollary: If the method in question was private, then ReSharper might analyze (though I'm not sure it does) the public code and verify that there is indeed no way that this particular private method will be called with a null reference, but since it is a public method, and one available through an interface, then

ReSharper is wrong.

What is the difference between using IEqualityComparer and Equals/GethashCode Override?

When you override Equals and GetHashCode you are changing the way the object will determine if it is equals to another. And a note, if you compare objects using == operator it will not have the same behavior as Equals unless you override the operator as well.

Doing that you changed the behavior for a single class, what if you need the same logic for other classes? If you need a "generic comparison". That is why you have IEqualityComparer.

Look at this example:

interface ICustom
{
int Key { get; set; }
}
class Custom : ICustom
{
public int Key { get; set; }
public int Value { get; set; }
}
class Another : ICustom
{
public int Key { get; set; }
}

class DicEqualityComparer : IEqualityComparer<ICustom>
{
public bool Equals(ICustom x, ICustom y)
{
return x.Key == y.Key;
}

public int GetHashCode(ICustom obj)
{
return obj.Key;
}
}

I have two different classes, both can use the same comparer.

var a = new Custom { Key = 1, Value = 2 };
var b = new Custom { Key = 1, Value = 2 };
var c = new Custom { Key = 2, Value = 2 };
var another = new Another { Key = 2 };

var d = new Dictionary<ICustom, string>(new DicEqualityComparer());

d.Add(a, "X");
// d.Add(b, "X"); // same key exception
d.Add(c, "X");
// d.Add(another, "X"); // same key exception

Notice that I didn't have to override Equals, GetHashCode in neither of the classes. I can use this comparer in any object that implements ICustom without having to rewrite the comparison logic. I can also make an IEqualityComparer for a "parent class" and use on classes that inherit. I can have comparer that will behave in a different way, I can make one to compare Value instead of Key.

So IEqualityComparer allows more flexibility and you can implement generic solutions.

What would be implementation GetHashCode() for IEqualityComparerdouble

This breaks the specification of IEqualityComparer<> even before you have to worry about GetHashCode.

For an equality check, you need x == y && y == z to imply x == z but this is not true for your Equals implementation. For example, with an epsilon of 1, you have 1 == 1.9 and 1.9 == 2.8 but not 1 == 2.8.

(You also require x == x, and x == y to imply y == x, but your equality check is fine with those.)

Using of IEqualityComparerT interface and EqualityComparerT class in C#

IEqualityComparer<in T> is an interface that will handle equality comparisons for your collection. Your collection will delegate equiality comparisons to this interface. You may ask, why don't just call the Equals method?

Because there can be several kinds of possible comparisons. Let's take an easy example: are "Abc" and "ABC" equal? It depends. "Abc".Equals("ABC") == false but what if you want case insensitivity?

This is why your collection should delegate the equality comparisons to a different class. By composing classes, you'll respect the single responsibility principle: Your collection knows how to store the items, and the equality comparer knows if they're equal.

An example with sets:

var caseSensitive = new HashSet<string>(StringComparer.Ordinal) // The default anyway
{
"Abc", "ABC"
};

var caseInsensitive = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"Abc", "ABC"
};

The results will be:

caseSensitive.Count == 2
caseInsensitive.Count == 1

caseSensitive.Contains("aBc") == false
caseInsensitive.Contains("aBc") == true

Here you have two completely different set semantics using the same HashSet class.

Now, what's in an IEqualityComparer<in T>?

  • bool Equals(T x, T y);: This method does just what you'd expect it to: it returns true if x should be considered equal to y. Just like the mathematical equality, it must be:
    • reflexive: Equals(x, x) == true
    • symmetric: Equals(x, y) == Equals(y, x)
    • transitive: if Equals(x, y) && Equals(y, z) then Equals(x, z)
  • int GetHashCode(T obj); this one may be trickier to get right. For each obj, it should return a hash code wilth the following properties:
    • it should never change
    • if Equals(x, y) then GetHashCode(x) == GetHashCode(y)
    • there should be as few collisions as possible

Note that this does not imply that if GetHashCode(x) == GetHashCode(y) then Equals(x, y). Two objects can have the same hash code but be inequal (there can be at most 0xFFFFFFFF possible hash codes after all).

Collections often use the hash code to organize their items. For example, the HashSet will know that if two objects don't have the same hash code, they won't be equal and thus can organize its buckets accordingly. Hash codes are just an optimization.

Now, what's EqualityComparer<T>.Default? It's a conventient shortcut for an IEqualityComparer<T> that will use an object's own Equals and GetHashCode functions. This is a good default value as this is what you want to do most of the time: while strings can have multiple natural comparison types, this is not the case for integers for instance.

EqualityComparer<T>.Default will handle a couple special cases:

  • if T is IEquatable<T>, it will use the IEquatable<T> interface
  • if T is Nullable<U> and U is IEquatable<U> it will handle the case properly
  • it will optimize for a couple special cases: byte[] and int-basd Enum.

Why we need the IEqualityComparer,IEqualityComparerT interface?

The IComparable interface is used when you need to be able to "sort" objects, and it gives you a method (CompareTo) that tells you if two objects are <, = or > . The constructor that uses IEqualityComparer let you give a specific Equals/GetHashCode that can be different than the ones defined by your object. Normally the Hashtable will use your object overridden Equals and GetHashCode (or the base object Equals and GetHashCode).

To make an example, the standard string compares in case sensitive way ("A" != "a"), but you could make an IEqualityComparer helper class to be able to hash your strings in a case insensitive way. (technically this class is already present in multiple variants: they are called StringComparer.InvariantCultureIgnoreCase and all the other static methods of StringComparer that return a StringComparer object that implements the IComparer, IEqualityComparer, IComparer<string>, IEqualityComparer<string>)

As a note, the Hashtable uses a IEqualityComparer optional parameter, not the generic version IEqualityComparer<T>, because Hashtable is pre-generics.

What are the roles of IEqualityComparer and IEquatable in the Enumerable.SequenceEqual method

IEquatable is used by the generic collections like Dictionary to determine if two objects are equal. If the object doesn't implement IEquatable, Object.Equals method is used.

Why should you implement IEquatable? It has better performance than Object.Equals because the object doesn't need to be casted.

When to not implement IEquatable? Some developers believe that you should only implement it on sealed classes.

If IEqualityComparer is specified in SequenceEquals, it is the one used to check the equality of two objects instead of Object.Equal and it's IEquatable implementation. The example for using it in SequenceEqual is in here http://msdn.microsoft.com/en-us/library/bb342073%28v=vs.110%29.aspx. Note that the method signature accepts an IEqualityComparer.

Many collections like Dictionary also accepts IEqualityComparer in it's constructor

Answering your question

If you didn't provide an IEqualityComparer to SequenceEquals, it will use EqualityComparer.Default.

Decompiled code:

public static bool SequenceEqual<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second)
{
return Enumerable.SequenceEqual<TSource>(first, second, (IEqualityComparer<TSource>) null);
}

public static bool SequenceEqual<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
if (comparer == null)
comparer = (IEqualityComparer<TSource>) EqualityComparer<TSource>.Default;
...

EqualityComparer.Default checks whether type T implements the System.IEquatable interface and, if so, returns an EqualityComparer that uses that implementation. Otherwise, it returns an EqualityComparer that uses the overrides of Object.Equals and Object.GetHashCode provided by T. This is why your Object.Equals is called.



Related Topics



Leave a reply



Submit