Gethashcode Guidelines in C#

GetHashCode Guidelines in C#

The answer is mostly, it is a valid guideline, but perhaps not a valid rule. It also doesn't tell the whole story.

The point being made is that for mutable types, you cannot base the hash code on the mutable data because two equal objects must return the same hash code and the hash code has to be valid for the lifetime of the object. If the hash code changes, you end up with an object that gets lost in a hashed collection because it no longer lives in the correct hash bin.

For example, object A returns hash of 1. So, it goes in bin 1 of the hash table. Then you change object A such that it returns a hash of 2. When a hash table goes looking for it, it looks in bin 2 and can't find it - the object is orphaned in bin 1. This is why the hash code must not change for the lifetime of the object, and just one reason why writing GetHashCode implementations is a pain in the butt.

Update
Eric Lippert has posted a blog that gives excellent information on GetHashCode.

Additional Update
I've made a couple of changes above:

  1. I made a distinction between guideline and rule.
  2. I struck through "for the lifetime of the object".

A guideline is just a guide, not a rule. In reality, GetHashCode only has to follow these guidelines when things expect the object to follow the guidelines, such as when it is being stored in a hash table. If you never intend to use your objects in hash tables (or anything else that relies on the rules of GetHashCode), your implementation doesn't need to follow the guidelines.

When you see "for the lifetime of the object", you should read "for the time the object needs to co-operate with hash tables" or similar. Like most things, GetHashCode is about knowing when to break the rules.

What is the best algorithm for overriding GetHashCode?

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ field1.GetHashCode();
hash = (hash * 16777619) ^ field2.GetHashCode();
hash = (hash * 16777619) ^ field3.GetHashCode();
return hash;
}
}

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.

The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing

GetHashCode functionality

in your code hashCode1 == hashCode is true because hashing same string will always give you the same result. however insances are different so you have to override GetHashCode() in a way that fits your business logic for example

    public override int GetHashCode()
{
return Name.GetHashCode() ^ Zip.GetHashCode() ^ Sex ^ Age;
}

I sugest you take a look at this answer GetHashCode Guidelines in C#.

and http://musingmarc.blogspot.com/2007/08/vtos-rtos-and-gethashcode-oh-my.html

Is GetHashCode guaranteed to be the same for the lifetime of an object?

If you look at the MSDN documentation you will find the following remarks about the default behavior of the GetHashCode method:

If GetHashCode is not overridden, hash codes for reference types are
computed by calling the Object.GetHashCode method of the base class,
which computes a hash code based on an object's reference; for more
information, see RuntimeHelpers.GetHashCode. In other words, two
objects for which the ReferenceEquals method returns true have
identical hash codes. If value types do not override GetHashCode, the
ValueType.GetHashCode method of the base class uses reflection to
compute the hash code based on the values of the type's fields. In
other words, value types whose fields have equal values have equal
hash codes

Based on my understanding we can assume that:

  • for a reference type (which doesn't override Object.GetHashCode) the
    value of the hash code of a given instance is guaranteed to be the
    same for the entire lifetime of the instance (because the memory
    address at which the object is stored won't change during its
    lifetime)
  • for a value type (which doesn't override Object.GetHashCode) it depends: if the value type is immutable then the hash code won't
    change during its lifetime. If, otherwise, the value of its fields
    can be changed after its creation then its hash code will change too.
    Please, notice that value types are generally immutable.

IMPORTANT EDIT

As pointed out in one comment above the .NET garbage collector can decide to move the physical location of an object in memory during the object lifetime, in other words an object can be "relocated" inside the managed memory.

This makes sense because the garbage collector is in charge of managing the memory allocated when objects are created.

After some searches and according to this stackoverflow question (read the comments provided by the user @supercat) it seems that this relocation does not change the hash code of an object instance during its lifetime, because the hash code is calculated once (the first time that it's value is requested) and the computed value is saved and reused later (when
the hash code value is requested again).

To summarize, based on in my understanding, the only thing you can assume is that given two references pointing to the same object in memory the hash codes of them will always be identical. In other words if Object.ReferenceEquals(a, b) then a.GetHashCode() == b.GetHashCode(). Furthermore it seems that given an object instance its hash code will stay the same for its entire lifetime, even if the physical memory address of the object is changed by the garbage collector.

SIDENOTE ON HASH CODES USAGE

It is important to always remember that the hash code has been introduced in the .NET framework at the sole purpose of handling the hash table data structure.

In order to determine the bucket to be used for a given value, the corresponding key is taken and its hash code is computed (to be precise, the bucket index is obtained by applying some normalizations on the value returned by the GetHashCode call, but the details are not important for this discussion). Put another way, the hash function used in the .NET implementation of hash tables is based on the computation of the hash code of the key.

This means that the only safe usage for an hash code is balancing an hash table, as pointed out by Eric Lippert here, so don't write code which depends on hash codes values for any other purpose.

Is Object.GetHashCode() unique to a reference or a value?

Rules 1 & 3 are contradictory to me.

To a certain extent, they are. The reason is simple: if an object is stored in a hash table and, by changing its value, you change its hash then the hash table has lost the value and you can't find it again by querying the hash table. It is important that while objects are stored in a hash table, they retain their hash value.

To realize this it is often simplest to make hashable objects immutable, thus evading the whole problem. It is however sufficient to make only those fields immutable that determine the hash value.

Consider the following example:

struct Person {
public readonly string FirstName;
public readonly string Name;
public readonly DateTime Birthday;

public int ShoeSize;
}

People rarely change their birthday and most people never change their name (except when marrying). However, their shoe size may grow arbitrarily, or even shrink. It is therefore reasonable to identify people using their birthday and name but not their shoe size. The hash value should reflect this:

public int GetHashCode() {
return FirstName.GetHashCode() ^ Name.GetHashCode() ^ Birthday.GetHashCode();
}

What are the rules I should follow to ensure GetHashCode() method returns unique value for an object?

You shouldn't even aim for GetHashCode() returning a unique value for each object. That's not the point of GetHashCode().

Eric Lippert has a great post about hash codes which you should read thoroughly. Basically you want to end up with something which will always return the same value for two equal objects (and you need to work out what you mean by equal) and is likely to return different values for two non-equal objects.

Personally I tend to use an implementation like this:

public override int GetHashCode()
{
int hash = 17;
hash = hash * 31 + field1.GetHashCode();
hash = hash * 31 + field2.GetHashCode();
hash = hash * 31 + field3.GetHashCode();
...
return hash;
}

Things to watch out for:

  • If you have mutable objects, be careful! You shouldn't mutate an object after using it as the key in a hash map.
  • If your fields can be null, you need to check for that while calculating your hash. For example:

    hash = hash * 31 + (field2 == null ? 0 : field2.GetHashCode());

Implementing GetHashCode correctly

Let's say your class looks like this:

class Frob {
public string Foo { get; set; }
public int Bar { get; set; }
public double FooBar { get; set; }
}

Let's say you define equals so that two instances of Frob are equal if their Foo and their Bar are equal, but FooBar doesn't matter.

Then you should define GetHashCode in terms of Foo and Bar. One way is like this:

return this.Foo.GetHashCode() * 17 + this.Bar.GetHashCode();

Basically, you just want to incorporate all the fields that go into defining the equality. One way is to just keep accumulating and multiplying by 17 like I've done. It's fast, it's simple, it's correct, and it usually gives a good distribution.

GetHashCodes - how and when it is used?

I like to use this implementation:

public override int GetHashCode()
{
unchecked
{
int hash = 17;
// Check for null
hash = hash * 29 + field1.GetHashCode();
hash = hash * 29 + field2.GetHashCode();
return hash;
}
}

Same GetHashCode() for different objects

The GetHashCode of System.Int32 works like:

public override int GetHashCode()
{
return this;
}

Which of course with this being 0, it will return 0.

System.Single's (float is alias) GetHashCode is:

public unsafe override int GetHashCode()
{
float num = this;
if (num == 0f)
{
return 0;
}
return *(int*)(&num);
}

Like you see, at 0f it will return 0.

Program used is ILSpy.



Related Topics



Leave a reply



Submit