Java Array Hashcode Implementation

How is the Arrays.hashCode implemented?

From the book Effective Java:

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

Does hashcode implementation of Java Arrays.hashcode() uniformly distribute

Why use the prime number 31?

This can be split in two parts?

  1. Why a prime number?

Here we need to understand that our goal is to get a unique HashCode for an object which will help us to find that object in O(1) time.

The key word here, is unique.

Primes

Primes are unique numbers. They are unique in that, the product of a
prime with any other number has the best chance of being unique (not
as unique as the prime itself of-course) due to the fact that a prime
is used to compose it. This property is used in hashing functions.

.

Why number 31?

From Effective Java

  • Because it's an odd prime, and it's "traditional" to use primes.
  • It's also one less than a power of two, which permits for bitwise
    optimization

    Here's the full quote,

from Item 9: Always override
hashCode when you override equals:

The value 31 was chosen because it's an odd prime. If it were even and
multiplication overflowed, information would be lost, as
multiplication by 2 is equivalent to shifting. The advantage of using
a prime is less clear, but it is traditional.

A nice property of 31 is that the multiplication can be replaced by a
shift (§15.19) and subtraction for better performance:

31 * i == (i << 5) - i Modern VMs do this sort of optimization
automatically.

While the recipe in this item yields reasonably good hash functions,
it does not yield state-of-the-art hash functions, nor do Java
platform libraries provide such hash functions as of release 1.6.
Writing such hash functions is a research topic, best left to
mathematicians and theoretical computer scientists.

Perhaps a later release of the platform will provide state-of-the-art
hash functions for its classes and utility methods to allow average
programmers to construct such hash functions. In the meantime, the
techniques described in this item should be adequate for most
applications.

This is a very Good source.

Do arrays have inherent hashCode()?

Arrays do have hashCode, per JLS 10.7 (emphasis added):

The members of an array type are all of the following:

  • The public final field length, which contains the number of components of the array. length may be positive or zero.

  • The public method clone, which overrides the method of the same name in class Object and throws no checked exceptions. The return type of the clone method of an array type T[] is T[].

  • A clone of a multidimensional array is shallow, which is to say that it creates only a single new array. Subarrays are shared.

  • All the members inherited from class Object; the only method of Object that is not inherited is its clone method.

This means that hashCode is inherited from Object, and thus is identity-based, and not dependent upon the values in the array.

This might be what you want, but I suggest it might not be. If you want a hash code based on values in an array, you would need to wrap the array in some class which implements a sensible equals and hash code.

hashCode() called on array instance in Java

Java arrays are directly derived from java.lang.Object and thus inherit the methods equals and hashCode from it.

The default equals method compares object identity (i.e. whether two references refer to the same object), and the default hashCode method will most probably return different values for different objects.

In most cases, this is a behavior that you do not want. You most probably want to compare arrays based on the content. Note, that your equals method seems to do that, which in turn means that your hashCode method is broken. Look at the method's documentation for the reason.

Java provides a class with helper methods for exactly that reason: java.util.Arrays. It provides the methods equals and hashCode that are both based on the content of the array.

So you should write it like that:

public class SomeObject {

private String a;
private char[] b;

@Override
public boolean equals(Object anotherObject) {
if (!(anotherObject instanceof SomeObject)) {
return false;
}
SomeObject object = (SomeObject) anotherObject;
return (this.a.equals(object.a) && Arrays.equals(this.b, object.b));
}

@Override
public int hashCode() {
return (43 * this.a.hashCode() + 11 * Arrays.hashCode(this.b));
}
}

Best implementation for hashCode method for a collection

The best implementation? That is a hard question because it depends on the usage pattern.

A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.

A short version

  1. Create a int result and assign a non-zero value.

  2. For every field f tested in the equals() method, calculate a hash code c by:

    • If the field f is a boolean:
      calculate (f ? 0 : 1);
    • If the field f is a byte, char, short or int: calculate (int)f;
    • If the field f is a long: calculate (int)(f ^ (f >>> 32));
    • If the field f is a float: calculate Float.floatToIntBits(f);
    • If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
    • If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
    • If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
  3. Combine the hash value c with result:

    result = 37 * result + c
  4. Return result

This should result in a proper distribution of hash values for most use situations.

hashCode computation with array fields in Java

Possible solution is use List<Byte> instead of byte[] and then that's correct way:

Objects.hashCode(field1, field2);

Overriding the hashCode( ) in java for a class that contains an array

To calculate a hashcode for the int array you can use java.util.Arrays.hashcode(int[]).

If you look at its implementation:

public static int hashCode(int a[]) {
if (a == null)
return 0;

int result = 1;
for (int element : a)
result = 31 * result + element;

return result;
}

you can derive an idea how to calculate a hash code for your class which should be based on the values of your two integers and the integer array:

public class MyClass {
private int a, b;
private int[] array;

public int hashCode() {
return (31 * (31 * Arrays.hashCode(array) + a)) + b;
}

To equals implementation can look like:

    public int equals(Object o) {
if (o instanceof of MyClass) {
MyClass m = (MyClass)o;
return m.a == a && m.b == b && Arrays.equals(m.array, array);
}
else
return false;
}

Collision strength of Java's Arrays.hashCode

Arrays.hashCode(double[]) is specified to return the equivalent value of a List containing Double values representing the same numeric value.

List.hashCode in turn is specified with a fairly simple algorithm:

int hashCode = 1;
for (E e : list)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());

In general the multiplication with a prime number is a good practice for general-purpose hash functions, but it's far from a cryptographically strong hash function.

This means that while collisions are unlikely in the general (effectively random) case, they can usually be constructed quite easily if you can influence (or select) the hashCode of the items in the List.

As a constructed example consider these two statements:

System.out.println(Arrays.hashCode(new double[] {4.753E-321d}));
System.out.println(Arrays.hashCode(new double[] {4.9E-324d, 4.9E-324d}));

Both of these will output 993, despite being clearly different arrays.



Related Topics



Leave a reply



Submit