How Does Native Implementation of Valuetype.Gethashcode Work

Why is ValueType.GetHashCode() implemented like it is?

UPDATE: This answer was (in part) the basis of a blog article I wrote which goes into more details about the design characteristics of GetHashcode. Thanks for the interesting question!


I didn't implement it and I haven't talked to the people who did. But I can point out a few things.

(Before I go on, note that here I am specifically talking about hash codes for the purposes of balancing hash tables where the contents of the table are chosen by non-hostile users. The problems of hash codes for digital signing, redundancy checking, or ensuring good performance of a hash table when some of the users are mounting denial-of-service attacks against the table provider are beyond the scope of this discussion.)

First, as Jon correctly notes, the given algorithm does implement the required contract of GetHashCode. It might be sub-optimal for your purposes, but it is legal. All that is required is that things that compare equal have equal hash codes.

So what are the "nice to haves" in addition to that contract? A good hash code implementation should be:

1) Fast. Very fast! Remember, the whole point of the hash code in the first place is to rapidly find a relatively empty slot in a hash table. If the O(1) computation of the hash code is in practice slower than the O(n) time taken to do the lookup naively then the hash code solution is a net loss.

2) Well distributed across the space of 32 bit integers for the given distribution of inputs. The worse the distribution across the ints, the more like a naive linear lookup the hash table is going to be.

So, how would you make a hash algorithm for arbitrary value types given those two conflicting goals? Any time you spend on a complex hash algorithm that guarantees good distribution is time poorly spent.

A common suggestion is "hash all of the fields and then XOR together the resulting hash codes". But that is begging the question; XORing two 32 bit ints only gives good distribution when the inputs themselves are extremely well-distributed and not related to each other, and that is an unlikely scenario:

// (Updated example based on good comment!)
struct Control
{
string name;
int x;
int y;
}

What is the likelihood that x and y are well-distributed over the entire range of 32 bit integers? Very low. Odds are much better that they are both small and close to each other, in which case xoring their hash codes together makes things worse, not better. xoring together integers that are close to each other zeros out most of the bits.

Furthermore, this is O(n) in the number of fields! A value type with a lot of small fields would take a comparatively long time to compute the hash code.

Basically the situation we're in here is that the user didn't provide a hash code implementation themselves; either they don't care, or they don't expect this type to ever be used as a key in a hash table. Given that you have no semantic information whatsoever about the type, what's the best thing to do? The best thing to do is whatever is fast and gives good results most of the time.

Most of the time, two struct instances that differ will differ in most of their fields, not just one of their fields, so just picking one of them and hoping that it's the one that differs seems reasonable.

Most of the time, two struct instances that differ will have some redundancy in their fields, so combining the hash values of many fields together is likely to decrease, not increase, the entropy in the hash value, even as it consumes the time that the hash algorithm is designed to save.

Compare this with the design of anonymous types in C#. With anonymous types we do know that it is highly likely that the type is being used as a key to a table. We do know that it is highly likely that there will be redundancy across instances of anonymous types (because they are results of a cartesian product or other join). And therefore we do combine the hash codes of all of the fields into one hash code. If that gives you bad performance due to the excess number of hash codes being computed, you are free to use a custom nominal type rather than the anonymous type.

How is base.GetHashCode() implemented for a struct?

The coreclr repo has this comment:

Action: Our algorithm for returning the hashcode is a little bit complex. We look
for the first non-static field and get it's hashcode. If the type has no
non-static fields, we return the hashcode of the type. We can't take the
hashcode of a static member because if that member is of the same type as
the original type, we'll end up in an infinite loop.

However, the code isn't there, and it looks like that's not quite what happens. Sample:

using System;

struct Foo
{
public string x;
public string y;
}

class Test
{
static void Main()
{
Foo foo = new Foo();
foo.x = "x";
foo.y = "y";
Console.WriteLine(foo.GetHashCode());
Console.WriteLine("x".GetHashCode());
Console.WriteLine("y".GetHashCode());
}
}

Output on my box:

42119818
372029398
372029397

Changing the value of y doesn't appear to change the hash code of foo.

However, if we make the fields int values instead, then more than the first field affects the output.

In short: this is complex, and you probably shouldn't depend on it remaining the same across multiple versions of the runtime. Unless you're really, really confident that you don't need to use your custom struct as a key in a hash-based dictionary/set, I'd strongly recommend overriding GetHashCode and Equals (and implementing IEquatable<T> to avoid boxing).

Implement a custom hashcode on a third party class

I am just expanding upon @Ilian answer. Tried to comment as much as possible so I believe it is better for the code to do the talking :)

// Mock 3rd Party point
public class ThirdPartyPoint {

}

// Mock 3rd party line
public class ThirdPartyLine {

public ThirdPartyPoint StartPoint { get; set; }
public ThirdPartyPoint EndPoint { get; set; }

}

// This class implements IEqualityComparer<ThirdPartyLine>, which compares
// ThirdPartyLine's equality. THis class will be passed as a ctor arument to HashSet<T>
public class CompareLines : IEqualityComparer<ThirdPartyLine> {

public bool Equals(ThirdPartyLine x, ThirdPartyLine y) {
// Here check for the equality of the start and end points.
// I asuumed the following but do not know how the eaulity is implemented in your library.
return x.EndPoint == y.EndPoint && x.StartPoint == y.StartPoint;
}

public int GetHashCode(ThirdPartyLine obj) {
// Implement an algortihm which must return same hashcode for objects considered the same.
// I am not sure about the Point class hashcode but I am jsut assuming the following.
return obj.StartPoint.GetHashCode() ^ obj.EndPoint.GetHashCode();
}

}

private static void Main(string[] args) {
// Hashset to hold lines
var hashSet = new HashSet<ThirdPartyLine>(new Compare());
// start point
var starPoint = new ThirdPartyPoint();
// end point
var endPoint = new ThirdPartyPoint();

// Lines with same start and end points
var line1 = new ThirdPartyLine {
StartPoint = starPoint,
EndPoint = endPoint
};

var line2 = new ThirdPartyLine {
StartPoint = starPoint,
EndPoint = endPoint
};

// Check count first
hashSet.Add(line1);
var count = hashSet.Count;

// Check count second, still 1
hashSet.Add(line2);
count = hashSet.Count;
}

How are Equals and GetHashCode implemented on anonymous types?

The compiler generates the GetHashCode() and Equals() overrides for you. For example, from this code:

class Program
{
static void Main(string[] args)
{
var a = new { Text = "foo", Value = 17 };

Console.WriteLine(a);
}
}

You can find the generated anonymous type in the compiled .exe, where the methods look like this (this is the output from dotPeek…there's also ToString()):

  [DebuggerHidden]
public override string ToString()
{
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append("{ Text = ");
stringBuilder.Append((object) this.\u003CText\u003Ei__Field);
stringBuilder.Append(", Value = ");
stringBuilder.Append((object) this.\u003CValue\u003Ei__Field);
stringBuilder.Append(" }");
return ((object) stringBuilder).ToString();
}

[DebuggerHidden]
public override bool Equals(object value)
{
var fAnonymousType0 = value as \u003C\u003Ef__AnonymousType0<\u003CText\u003Ej__TPar, \u003CValue\u003Ej__TPar>;
return fAnonymousType0 != null && EqualityComparer<\u003CText\u003Ej__TPar>.Default.Equals(this.\u003CText\u003Ei__Field, fAnonymousType0.\u003CText\u003Ei__Field) && EqualityComparer<\u003CValue\u003Ej__TPar>.Default.Equals(this.\u003CValue\u003Ei__Field, fAnonymousType0.\u003CValue\u003Ei__Field);
}

[DebuggerHidden]
public override int GetHashCode()
{
return -1521134295 * (-1521134295 * 512982588 + EqualityComparer<\u003CText\u003Ej__TPar>.Default.GetHashCode(this.\u003CText\u003Ei__Field)) + EqualityComparer<\u003CValue\u003Ej__TPar>.Default.GetHashCode(this.\u003CValue\u003Ei__Field);
}

Related reading:

How does ToString on an anonymous type work?

Why anonymous types Equals implementation compares fields?

Equality for anonymous types

Why is ValueType.GetHashCode() implemented like it is?

None of those directly address your question, but they do provide some relevant insights into the specific implementations of these overrides.



Related Topics



Leave a reply



Submit