Consistency of Hashcode() on a Java String

Consistency of hashCode() on a Java string

I can see that documentation as far back as Java 1.2.

While it's true that in general you shouldn't rely on a hash code implementation remaining the same, it's now documented behaviour for java.lang.String, so changing it would count as breaking existing contracts.

Wherever possible, you shouldn't rely on hash codes staying the same across versions etc - but in my mind java.lang.String is a special case simply because the algorithm has been specified... so long as you're willing to abandon compatibility with releases before the algorithm was specified, of course.

Will the hashcode of a String will be the same for the Entire Application?

According to hashcode contract it will always the same if string1.eqauls(string2)

The java.lang.String hash function

In an attempt to provide a fast implementation, early versions of the Java String class provided a hashCode() implementation that considered at most 16 characters picked from the string. For some common data this worked very poorly, delivering unacceptably clustered results and consequently slow hashtable performance.

From Java 1.2, java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string. Given an instance s of the java.lang.String class, for example, would have a hash code h(s) defined by

 h(s)=\sum_{i=0}^{n-1}s[i] \cdot 31^{n-1-i}

where terms are summed using Java 32-bit int addition, s[i] denotes the ith character of the string, and n is the length of s.

As with any general hashing function, collisions are possible. For example, the strings "FB" and "Ea" have the same hash value. The hashCode() implementation of String uses the prime number 31 and the difference between 'a' and 'B' is just 31, so the calculation is 70 × 31 + 66 = 69 × 31 + 97.

Check Collections Framework Enhancements in Java SE 7 as you see there are changes in it and who knows will be.

The alternative hash function is only applied to keys of type String.

Java and string.hashCode() stability across machines in cluster

The implementation of String.hashCode() is specified in the documentation, so it's guaranteed to be consistent:

The hash code for a String object is computed as

  s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

All of those operations are implemented platform-independently for Java -- the platform byte order is irrelevant, for example.

That said, ways of getting a String can be tricky, if you're getting it from a file or another source of bytes. In that case, you're fine so long as you explicitly specify a Charset. (Remember that Strings don't have different encodings per se; an encoding is a specification for conversions between a byte[] and a String.)

How to ensure hashCode() is consistent with equals()?

It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.

IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).

For example, here's a hashCode function that Idea generates for your People class:

public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}

Java, Object.hashCode() result constant across all JVMs/Systems?

No. The output of hashCode is liable to change between JVM implementations and even between different executions of a program on the same JVM.

However, in the specific example you gave, the value of "test".hashCode() will actually be consistent because the implementation of hashCode for String objects is part of the API of String (see the Javadocs for java.lang.String and this other SO post).

String hashCode(): always the same result?

  • The same String should has the same hashCode() (based on hashCode definition)

If you take a look at Android hashCode() of String class. You will see hashCode is calculated based on char array (the same), char count ( the same) and offset field ( this value seems always Zero (0) - is set in String constructor - I don't know why Google adds this offset field. Oracle String.hashCode() is calculated based on char array, char count only.

  • You can build your own hashCode() function like Oracle String hashCode(): This implementation is based on char array and char count so the same String always has the same hashCode().

Finding a String with the same hashcode() value as given String

You need to solve this equation:

(x * 31^(3-1)) + (y * 31^(3-2)) + (z * 31^(3-3)) = 99341

It's a plane, every point in that plane is good for you as long as x,y,z are integers and between 0 and 255 (if you are only talking about ASCII).

One possible solution is x=100, y=101, z=110.
To simply find another one, you can just change the order of two of them and see what would be the third one, e.g.:

x=101
y=???
z=110

(101 * 31^(3-1)) + (y * 31^(3-2)) + (110 * 31^(3-3)) = 99341

where y=70 (F) so eFn should have the same hashCode as den using the function in your question.



Related Topics



Leave a reply



Submit