Good Gethashcode() Override for List of Foo Objects Respecting the Order

I'd like to know how can I write good GetHashCode() for ordered list

Jon Skeet provides a great example on a similar post:

public override int GetHashCode()
{
unchecked
{
int hash = 19;
foreach (var foo in foos)
{
hash = hash * 31 + foo.GetHashCode();
}
return hash;
}
}

Primes are generally very useful when hashing. If you're interested to know why, you'll have to look up the maths because people are most likely better at explaining that than I am.

C# – How to override GetHashCode for List T to calculate ETag with T being a record

A little extension method and HashCode could help with this:

internal static class EnumerableExtensions {
public static int GetCombinedHashCode<T>(this IEnumerable<T> source) =>
source.Aggregate(typeof(T).GetHashCode(), (hash, t) => HashCode.Combine(hash, t));
}

Seeding the hash with typeof(T).GetHashCode is a rather arbitrary, but ensures that empty collections of different types do not all "look equal", since they would not normally compare equal either. Whether this matters or is even desirable will depend on your scenario.

Of course the result of this is only usable if T has a meaningful GetHashCode implementation, but that's true of hashes in general. For extra peace of mind a where T : IEquatable<T> constraint could be added, although that's not the standard approach for methods involving hashes. Adding the ability to use a custom IEqualityComparer<T> for the hash is left as an exercise.

implement GetHashCode() for objects that contain collections

I think your solution is fine. (Much later remark: LINQ's Sum method will act in checked context, so you can very easily get an OverflowException which means it is not so fine, after all.) But it is more usual to do XOR (addition without carry). So it could be something like

public override int GetHashCode()
{
int hc = 0;
if (Paths != null)
foreach (var p in Paths)
hc ^= p.GetHashCode();
return hc;
}

Addendum (after answer was accepted):

Remember that if you ever use this type Routing in a Dictionary<Routing, Whatever>, a HashSet<Routing> or another situation where a hash table is used, then your instance will be lost if someone alters (mutates) the Routing after it has been added to the collection.

If you're sure that will never happen, use my code above. Dictionary<,> and so on will still work if you make sure no-one alters the Routing that is referenced.

Another choice is to just write

public override int GetHashCode()
{
return 0;
}

if you believe the hash code will never be used. If every instace returns 0 for hash code, you will get very bad performance with hash tables, but your object will not be lost. A third option is to throw a NotSupportedException.

GetHashCode not called from list elements

List<> doesn't override GetHashCode, so it will call the base implementation of Object.

As you've discovered, if you want to hash the contents on a list you need to iterate over it and do it manually.

If you're a fan of Linq you can calculate the hash using the Aggregate method:

var hash = list.Aggregate(13, (agg, curr) => (agg * 7) + curr.GetHashCode());

Consistent HashCode for a List of String accross processes/platforms

Per the comments to my previous answer:

    private int GetHashCode(IEnumerable<string> value)
{
var encoder = new UTF8Encoding();
var hash = new SHA256CryptoServiceProvider();
var sb = new StringBuilder();

foreach (var item in value)
{
sb.Append(item);
}

return
Convert.ToInt32(
new Rfc2898DeriveBytes(sb.ToString(),
hash.ComputeHash(encoder.GetBytes(sb.ToString()))).GetBytes(4));
}

Immutable Collection Hashcode

I have put together a few different methods of comparing arrays of bytes, I have used an arbitrary array length of 10000 and assumed that both compared arrays are of the same length (because a "broad phase" length check is obviously not very interesting :) )

Perhaps you could use this as a basis for making a decision on which method to use when comparing the arrays for equality.

The results are the average of 5 iterations for three scenarios (equal, first element different and last element different) and the timings are in ms.

---------------
Identical elements
---------------
SequenceEqual: 5.98142
BasicEqual: 0.11864
UnsafeMemCmp: 0.15542
SafeMemCmp: 0.12896
---------------
First element different
---------------
SequenceEqual: 0.00056
BasicEqual: 0.00012
UnsafeMemCmp: 0.0002
SafeMemCmp: 0.00182
---------------
Last element different
---------------
SequenceEqual: 0.14942
BasicEqual: 0.03178
UnsafeMemCmp: 0.0015
SafeMemCmp: 0.00326
---------------

The 4 methods I have chosen are:

SequentalEqual

static bool SequenceEqual(byte[] arr1, byte[] arr2)
{
return arr1.SequenceEqual(arr2);
}

BasicEqual

static bool BasicEqual(byte[] arr1, byte[] arr2)
{
for (var i = 0; i < 10000; i++)
if (arr1[i] != arr2[i])
return false;
return true;
}

UnsafeMemCmp

[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern unsafe int memcmp(byte* b1, byte* b2, int count);

static unsafe bool UnsafeMemCmp(byte[] arr1, byte[] arr2)
{
fixed (byte* b1 = arr1, b2 = arr2)
{
return memcmp(b1, b2, 10000) == 0;
}
}

SafeMemCmp

[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int memcmp(IntPtr b1, IntPtr b2, int count);

static bool SafeMemCmp(byte[] arr1, byte[] arr2)
{
var a = Marshal.AllocHGlobal(arr1.Length);
var b = Marshal.AllocHGlobal(arr2.Length);

try
{
Marshal.Copy(arr1, 0, a, arr1.Length);
Marshal.Copy(arr2, 0, b, arr2.Length);

return memcmp(a, b, 10000) == 0;
}
finally
{
Marshal.FreeHGlobal(a);
Marshal.FreeHGlobal(b);
}
}

For completion, the tests were run using the following method:

static void RunTest(string name, Func<byte[], byte[], bool> action, byte[] a, byte[] b)
{
TimeSpan total = TimeSpan.Zero;

for (var i = 0; i < 5; i++)
{
_stopwatch.Reset();
_stopwatch.Start();
action(a, b);
_stopwatch.Stop();
total += _stopwatch.Elapsed;
}

Console.WriteLine(name + ": " + (total.TotalMilliseconds / 5));
}


Related Topics



Leave a reply



Submit