Consistency of hashCode() on a Java string
I can see that documentation as far back as Java 1.2.
While it's true that in general you shouldn't rely on a hash code implementation remaining the same, it's now documented behaviour for java.lang.String
, so changing it would count as breaking existing contracts.
Wherever possible, you shouldn't rely on hash codes staying the same across versions etc - but in my mind java.lang.String
is a special case simply because the algorithm has been specified... so long as you're willing to abandon compatibility with releases before the algorithm was specified, of course.
Will the hashcode of a String will be the same for the Entire Application?
According to hashcode contract it will always the same if string1.eqauls(string2)
The java.lang.String hash function
In an attempt to provide a fast implementation, early versions of the Java String class provided a hashCode() implementation that considered at most 16 characters picked from the string. For some common data this worked very poorly, delivering unacceptably clustered results and consequently slow hashtable performance.
From Java 1.2, java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string. Given an instance s of the java.lang.String class, for example, would have a hash code h(s) defined by
where terms are summed using Java 32-bit int addition, s[i] denotes the ith character of the string, and n is the length of s.
As with any general hashing function, collisions are possible. For example, the strings "FB" and "Ea" have the same hash value. The hashCode() implementation of String uses the prime number 31 and the difference between 'a' and 'B' is just 31, so the calculation is 70 × 31 + 66 = 69 × 31 + 97.
Check Collections Framework Enhancements in Java SE 7
as you see there are changes in it and who knows will be.
The alternative hash function is only applied to keys of type String.
Java and string.hashCode() stability across machines in cluster
The implementation of String.hashCode()
is specified in the documentation, so it's guaranteed to be consistent:
The hash code for a String object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)
All of those operations are implemented platform-independently for Java -- the platform byte order is irrelevant, for example.
That said, ways of getting a String
can be tricky, if you're getting it from a file or another source of bytes. In that case, you're fine so long as you explicitly specify a Charset
. (Remember that String
s don't have different encodings per se; an encoding is a specification for conversions between a byte[]
and a String
.)
How to ensure hashCode() is consistent with equals()?
It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.
IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).
For example, here's a hashCode function that Idea generates for your People class:
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}
Java, Object.hashCode() result constant across all JVMs/Systems?
No. The output of hashCode
is liable to change between JVM implementations and even between different executions of a program on the same JVM.
However, in the specific example you gave, the value of "test".hashCode()
will actually be consistent because the implementation of hashCode
for String
objects is part of the API of String
(see the Javadocs for java.lang.String and this other SO post).
String hashCode(): always the same result?
- The same String should has the same hashCode() (based on hashCode definition)
If you take a look at Android hashCode() of String class. You will see hashCode is calculated based on char array (the same), char count ( the same) and offset field ( this value seems always Zero (0) - is set in String constructor - I don't know why Google adds this offset field. Oracle String.hashCode() is calculated based on char array, char count only.
- You can build your own hashCode() function like Oracle String hashCode(): This implementation is based on char array and char count so the same String always has the same hashCode().
Finding a String with the same hashcode() value as given String
You need to solve this equation:
(x * 31^(3-1)) + (y * 31^(3-2)) + (z * 31^(3-3)) = 99341
It's a plane, every point in that plane is good for you as long as x,y,z are integers and between 0 and 255 (if you are only talking about ASCII).
One possible solution is x=100, y=101, z=110
.
To simply find another one, you can just change the order of two of them and see what would be the third one, e.g.:
x=101
y=???
z=110
(101 * 31^(3-1)) + (y * 31^(3-2)) + (110 * 31^(3-3)) = 99341
where y=70
(F
) so eFn
should have the same hashCode
as den
using the function in your question.
Related Topics
Split List into Multiple Lists with Fixed Number of Elements in Java 8
How to Connect to Postgres Db Due to the Authentication Type 10 Is Not Supported
Java.Sql.Sqlexception: Exhausted Resultset
How to Compare Character Ignoring Case in Primitive Types
Variable's Scope in a Switch Case
How to Convert Binary String Value to Decimal
Where Are Compiled Jsp Java (*_Jsp.Java) Files
Spring Autowiring Using @Configurable
Java Jtextfield with Input Hint
Gmail Login Using Selenium Webdriver in Java
How to Display a Svg Byte Array as an Image in a Jasperreport
Instance Method Reference and Lambda Parameters
Using Jasperreports with a Relative Path
Should I Call Ugi.Checktgtandreloginfromkeytab() Before Every Action on Hadoop