Overriding Gethashcode for Mutable Objects

Overriding GetHashCode for mutable objects?

How does that work if the fields that it's based on are mutable?

It doesn't in the sense that the hash code will change as the object changes. That is a problem for all of the reasons listed in the articles you read. Unfortunately this is the type of problem that typically only show up in corner cases. So developers tend to get away with the bad behavior.

Also what if I do want dictionary lookups etc to be based on reference equality not my overridden Equals?

As long as you implement an interface like IEquatable<T> this shouldn't be a problem. Most dictionary implementations will choose an equality comparer in a way that will use IEquatable<T> over Object.ReferenceEquals. Even without IEquatable<T>, most will default to calling Object.Equals() which will then go into your implementation.

Basically in most of the executing code I want reference equality and I always use == and I'm not overriding that.

If you expect your objects to behave with value equality you should override == and != to enforce value equality for all comparisons. Users can still use Object.ReferenceEquals if they actually want reference equality.

I used to assume that the framework always uses == and not Equals to compare things

What the BCL uses has changed a bit over time. Now most cases which use equality will take an IEqualityComparer<T> instance and use it for equality. In the cases where one is not specified they will use EqualityComparer<T>.Default to find one. At worst case this will default to calling Object.Equals

What to return when overriding Object.GetHashCode() in classes with no immutable fields?

Go back to basics. You read my article; read it again. The two ironclad rules that are relevant to your situation are:

  • if x equals y then the hash code of x must equal the hash code of y. Equivalently: if the hash code of x does not equal the hash code of y then x and y must be unequal.
  • the hash code of x must remain stable while x is in a hash table.

Those are requirements for correctness. If you can't guarantee those two simple things then your program will not be correct.

You propose two solutions.

Your first solution is that you always return a constant. That meets the requirement of both rules, but you are then reduced to linear searches in your hash table. You might as well use a list.

The other solution you propose is to somehow produce a hash code for each object and store it in the object. That is perfectly legal provided that equal items have equal hash codes. If you do that then you are restricted such that x equals y must be false if the hash codes differ. This seems to make value equality basically impossible. Since you wouldn't be overriding Equals in the first place if you wanted reference equality, this seems like a really bad idea, but it is legal provided that equals is consistent.

I propose a third solution, which is: never put your object in a hash table, because a hash table is the wrong data structure in the first place. The point of a hash table is to quickly answer the question "is this given value in this set of immutable values?" and you don't have a set of immutable values, so don't use a hash table. Use the right tool for the job. Use a list, and live with the pain of doing linear searches.

A fourth solution is: hash on the mutable fields used for equality, remove the object from all hash tables it is in just before every time you mutate it, and put it back in afterwards. This meets both requirements: the hash code agrees with equality, and hashes of objects in hash tables are stable, and you still get fast lookups.

GetHashCode() - immutable values?

GetHashCode() is there for mainly one reason: retrieval of an object from a hash table. You are right that it is desirable that the hash code should be computed only from immutable fields, but think about the reason for this. Since the hashcode is used to retrieve an object from a hashtable it will lead to errors when the hashcode changes while the object is stored in the hashtable.

To put it more generally: the value returned by GetHashCode must stay stable as long as a structure depends on that hashcode to stay stable. So for you example it means you can change the id field as long as the object is currently not used in any such structure.

Overriding GetHashCode()

If all your fields are mutable and you have to implement GetHashCode method, I am afraid this is the implementation you would need to have.

public override int GetHashCode() 
{
return 1;
}

Yes, this is inefficient but this is at least correct.

The problem is that GetHashCode is being used by Dictionary and HashSet collections to place each item in a bucket. If hashcode is calculated based on some mutable fields and the fields are really changed after the object is placed into the HashSet or Dictionary, the object can no longer be found from the HashSet or Dictionary.

Note that with all the objects returning the same HashCode 1, this basically means all the objects are being put in the same bucket in the HashSet or Dictionary. So, there is always only one single bucket in the HashSet or Dictionary. When trying to lookup the object, it will do a equality check on each of the objects inside the only bucket. This is like doing a search in a linked list.

Somebody may argue that implementing the hashcode based on mutable fields can be fine if we can make sure fields are never changed after the objects added to HashCode or Dictionary collection. My personal view is that this is error-prone. Somebody taking over your code two years later might not be aware of this and breaks the code accidentally.

Overriding GetHashCode() for value objects without fields

If there is no data associated with the class then make only one instance.

sealed class FirstRequestHeader : RequestHeader
{
public static readonly FirstRequestHeader Value = new FirstRequestHeader();

private FirstRequestHeader()
{

}
}

Should GetHashCode be implemented for IEquatableT on mutable types?

Mutable classes work quite bad with Dictionaries and other classes that relies on GetHashCode and Equals.

In the scenario you are describing, with mutable object, I suggest one of the following:

class ConstantHasCode: IEquatable<ConstantHasCode>
{
public int SomeVariable;
public virtual Equals(ConstantHasCode other)
{
return other.SomeVariable == SomeVariable;
}

public override int GetHashCode()
{
return 0;
}
}

or

class ThrowHasCode: IEquatable<ThrowHasCode>
{
public int SomeVariable;
public virtual Equals(ThrowHasCode other)
{
return other.SomeVariable == SomeVariable;
}

public override int GetHashCode()
{
throw new ApplicationException("this class does not support GetHashCode and should not be used as a key for a dictionary");
}
}

With the first, Dictionary works (almost) as expected, with performance penalty in lookup and insertion: in both cases, Equals will be called for every element already in the dictionary until a comparison return true. You are actually reverting to performance of a List

The second is a way to tell the programmers will use your class "no, you cannot use this within a dictionary".
Unfortunately, as far as I know there is no method to detect it at compile time, but this will fail the first time the code adds an element to the dictionary, very likely quite early while developping, not the kind of bug happening only in production environment with an unpredicted set of input.

Last but not least, ignore the "mutable" problem and implement GetHashCode using member variables: now you have to be aware that you are not free to modify the class when it's used withing a Dictionary. In some scenario this can be acceptable, in other it's not

Continuing confusion regarding overring Equals for mutable objects that are used in data bound collections

Can anyone tell me how can I remove the Equals and GetHashCode overrides? On the IEquatable Interface page on MSDN it says It should be implemented for any object that might be stored in a generic collection and then on the IEquatable.Equals Method page it says If you implement Equals, you should also override the base class implementations of Object.Equals(Object) and GetHashCode so that their behaviour is consistent with that of the IEquatable.

Mutable objects are comparable by their identity while immutable or value objects by their values.

If you have a mutable object you need to figure out its identity (e.g. if it is a representation of an entity stored in the database the identity is the primary key of the identity; if it is just an 'ad hoc' mutable object created in memory, then its identity is reference of this object (i.e. the default implementation of Equals and GetHashCode)).

So if your object is not an entity you simply implement IEquatable.Equals(T x) { return this.Equals(x); }, i.e. you say that, yes you can compare objects of this class with objects of class T and you compare it by reference (Equals() method inherited from System.Object).

If your object is an entity and e.g. has a primary key PersonId, then you do comparison by PersonId and return PersonId.GetHashCode() from your GetHashCode() method.

Btw, in case of entities you usually use some OR mapper and Identity map pattern which ensures that within a given unit of work you don't have more than one instance of a given entity, i.e. whenever primary keys are equal the object references are equal too.

What is the best algorithm for overriding GetHashCode?

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ field1.GetHashCode();
hash = (hash * 16777619) ^ field2.GetHashCode();
hash = (hash * 16777619) ^ field3.GetHashCode();
return hash;
}
}

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.

The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing



Related Topics



Leave a reply



Submit