What's the best strategy for Equals and GetHashCode?
Assuming that the instances are equal because the hash codes are equal is wrong.
I guess your implementation of GetHashCode is OK, but I usually use things similar to this:
public override int GetHashCode() {
return object1.GetHashCode ^ intValue1 ^ (intValue2 << 16);
}
What is the best algorithm for overriding GetHashCode?
I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}
As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)
This is better than the common practice of XOR
ing hashcodes for two main reasons. Suppose we have a type with two int
fields:
XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y
By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.
This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR
instead of ADD
as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.
// Note: Not quite FNV!
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ field1.GetHashCode();
hash = (hash * 16777619) ^ field2.GetHashCode();
hash = (hash * 16777619) ^ field3.GetHashCode();
return hash;
}
}
Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.
As per the documentation:
You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:
- You can compute the hash code from fields that are not mutable; or
- You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.
The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing
Why should GetHashCode implement the same logic as Equals?
When storing a value in a hash table, such as Dictionary<>
, the framework will first call GetHashCode()
and check if there's already a bucket in the hash table for that hash code. If there is, it will call .Equals()
to see if the new value is indeed equal to the existing value. If not (meaning the two objects are different, but result in the same hash code), you have what's known as a collision. In this case, the items in this bucket are stored as a linked list and retrieving a certain value becomes O(n).
If you implemented GetHashCode()
but did not implement Equals()
, the framework would resort to using reference equality to check for equality which would result in every instance creating a collision.
If you implemented Equals()
but did not implement GetHashCode()
, you might run into a situation where you had two objects that were equal, but resulted in different hash codes meaning they'd maintain their own separate values in your hash table. This would potentially confuse anyone using your class.
As far as what objects are considered equal, that's up to you. If I create a hash table based on temperature, should I be able to refer to the same item using either its Celsius or Fahrenheit value? If so, they need to result in the same hash value and Equals()
needs to return true.
Update:
Let's step back and take a look at the purpose of a hash code in the first place. Within this context, a hash code is used as a quick way to identify if two objects are most likely equal. If we have two objects that have different hash codes, we know for a fact they are not equal. If we have two objects that have the same hash code, we know they are most likely equal. I say most likely because an int can only be used to represent a few billion possible values, and strings can of course contain the complete works of Charles Dickens, or any number of possible values. Much in the .NET framework is based on these truths, and developers that use your code will assume things work in a way that is consistent with the rest of the framework.
If you were to have two instances that have different hash codes, but have an implementation of Equals()
that returns true, you're breaking this convention. A developer that compares two objects might then use one of of those objects to refer to a key in a hash table and expect to get an existing value out. If all of a sudden the hash code is different, this code might result in a runtime exception instead. Or perhaps return a reference to a completely different object.
Whether 295.15k and 22C are equal within the domain of your program is your choice (In my opinion, they are not). However, whatever you decide, objects that are equal must return the same has code.
General advice and guidelines on how to properly override object.GetHashCode()
Table of contents
When do I override
object.GetHashCode
?Why do I have to override object.GetHashCode()?
What are those magic numbers seen in GetHashCode implementations?
Things that I would like to be covered, but haven't been yet:
- How to create the integer (How to "convert" an object into an int wasn't very obvious to me anyways).
- What fields to base the hash code upon.
- If it should only be on immutable fields, what if there are only mutable ones?
- How to generate a good random distribution. (MSDN Property #3)
- Part to this, seems to choose a good magic prime number (have seen 17, 23 and 397 been used), but how do you choose it, and what is it for exactly?
- How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2)
- Especially when the equality is based upon mutable fields. (MSDN Property #1)
- How to deal with fields that are complex types (not among the built-in C# types).
- Complex objects and structs, arrays, collections, lists, dictionaries, generic types, etc.
- For example, even though the list or dictionary might be readonly, that doesn't mean the contents of it are.
- How to deal with inherited classes.
- Should you somehow incorporate
base.GetHashCode()
into your hash code?
- Should you somehow incorporate
- Could you technically just be lazy and return 0? Would heavily break MSDN guideline number #3, but would at least make sure #1 and #2 were always true :P
- Common pitfalls and gotchas.
Equals vs GetHashCode when comparing objects
Here's what you need to understand about the relationship between Equals
and GetHashCode
.
Hash codes are used by hashtables to quickly find a "bucket" in which an element is expected to exist. If elements are in two different buckets, the assumption is they cannot be equal.
The outcome of this is that you should view a hash code, for purposes of determining uniqueness, as a quick negative check: i.e., if two objects have different hash codes, they are not the same (regardless of what their Equals
methods return).
If two objects have the same hash code, they will reside in the same bucket of a hash table. Then their Equals
methods will be called to determine equality.
So GetHashCode
must return the same value for two objects that you want to be considered equal.
Correct way to override Equals() and GetHashCode()
You can override Equals() and GetHashCode() on your class like this:
public override bool Equals(object obj)
{
var item = obj as RecommendationDTO;
if (item == null)
{
return false;
}
return this.RecommendationId.Equals(item.RecommendationId);
}
public override int GetHashCode()
{
return this.RecommendationId.GetHashCode();
}
Why use GetHashCode() over Equals()?
For some types, an Equals
test may be relatively expensive. It typically has to compare every field of the class. In other words, it takes linear time in the size of the class. Bigger classes are more expensive to compare for equality.
Now, what happens if you need to need to compare one object against 1000 others? Calling Equals
1000 times might get expensive. You need to do N*2000 field accesses, if N is the size of the class
GetHashCode
instead generates a "mostly unique" integer based on the contents of the class. In other words, the class fields are accessed once. And once you have that, you can compare this integer to the 1000 integers that make up the other objects' hash codes.
Even in such a naive use case, we now only need N*1000 field accesses.
But what if we store the hash code? When we insert an object into a hash set, its hash code is computed once. Now, any time we want to do a lookup in the hash set, we just need to compute one hash code (that of the external object), and then you just have to compare simple integers.
So N class field accesses (for the new object whose hash code we need to compute), plus a number of integer comparisons, which vary, depending on the algorithm, but are 1) relatively few, and 2) cheap.
Why is it important to override GetHashCode when Equals method is overridden?
Yes, it is important if your item will be used as a key in a dictionary, or HashSet<T>
, etc - since this is used (in the absence of a custom IEqualityComparer<T>
) to group items into buckets. If the hash-code for two items does not match, they may never be considered equal (Equals will simply never be called).
The GetHashCode() method should reflect the Equals
logic; the rules are:
- if two things are equal (
Equals(...) == true
) then they must return the same value forGetHashCode()
- if the
GetHashCode()
is equal, it is not necessary for them to be the same; this is a collision, andEquals
will be called to see if it is a real equality or not.
In this case, it looks like "return FooId;
" is a suitable GetHashCode()
implementation. If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (i.e. so that new Foo(3,5)
has a different hash-code to new Foo(5,3)
):
In modern frameworks, the HashCode
type has methods to help you create a hashcode from multiple values; on older frameworks, you'd need to go without, so something like:
unchecked // only needed if you're compiling with arithmetic checks enabled
{ // (the default compiler behaviour is *disabled*, so most folks won't need this)
int hash = 13;
hash = (hash * 7) + field1.GetHashCode();
hash = (hash * 7) + field2.GetHashCode();
...
return hash;
}
Oh - for convenience, you might also consider providing ==
and !=
operators when overriding Equals
and GetHashCode
.
A demonstration of what happens when you get this wrong is here.
Using GetHashCode to test equality in Equals override
The others are right; your equality operation is broken. To illustrate:
public static void Main()
{
var c1 = new Class1() { A = "apahaa", B = null };
var c2 = new Class1() { A = "abacaz", B = null };
Console.WriteLine(c1.Equals(c2));
}
I imagine you want the output of that program to be "false" but with your definition of equality it is "true" on some implementations of the CLR.
Remember, there are only about four billion possible hash codes. There are way more than four billion possible six letter strings, and therefore at least two of them have the same hash code. I've shown you two; there are infinitely many more.
In general you can expect that if there are n possible hash codes then the odds of getting a collision rise dramatically once you have about the square root of n elements in play. This is the so-called "birthday paradox". For my article on why you shouldn't rely upon hash codes for equality, click here.
Related Topics
How to Include Quotes in a String
Equivalent of SQL Isnull in Linq
Create Dynamic Buttons in a Grid Layout - Create a Magic Square Ui
Why/When Would It Be Appropriate to Override Tostring
ASP.NET Asmx Web Service Returning Xml Instead of JSON
Razor: Declarative HTML Helpers
Displaynamefor() from List<Object> in Model
How to Get the Actual Monitor Name? as Seen in the Resolution Dialog
Server Execution Failed (Exception from Hresult: 0X80080005 (Co_E_Server_Exec_Failure))
How to Make the Value of a Variable Track the Value of Another
Programmatically Get a Screenshot of a Page
Http 404 Page Not Found in Web API Hosted in Iis 7.5
How to Use Xpath with Xdocument
How to Execute Process Commands (Or Similar) Using a Universal Windows Platform (Uwp) App