Good Hash Function for Strings
Usually hashes wouldn't do sums, otherwise stop
and pots
will have the same hash.
and you wouldn't limit it to the first n characters because otherwise house and houses would have the same hash.
Generally hashs take values and multiply it by a prime number (makes it more likely to generate unique hashes) So you could do something like:
int hash = 7;
for (int i = 0; i < strlen; i++) {
hash = hash*31 + charAt(i);
}
hash function for string
I've had nice results with djb2
by Dan Bernstein.
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
Hash function for a string
First, it usually does not matter that much in practice. Most hash functions are "good enough".
But if you really care, you should know that it is a research subject by itself. There are thousand of papers about that. You can still get a PhD today by studying & designing hashing algorithms.
Your second hash function might be slightly better, because it probably should separate the string "ab"
from the string "ba"
. On the other hand, it is probably less quick than the first hash function. It may, or may not, be relevant for your application.
I'll guess that hash functions used for genome strings are quite different than those used to hash family names in telephone databases. Perhaps even some string hash functions are better suited for German, than for English or French words.
Many software libraries give you good enough hash functions, e.g. Qt has qhash, and C++11 has std::hash in <functional>
, Glib has several hash functions in C, and POCO has some hash function.
I quite often have hashing functions involving primes (see Bézout's identity) and xor, like e.g.
#define A 54059 /* a prime */
#define B 76963 /* another prime */
#define C 86969 /* yet another prime */
#define FIRSTH 37 /* also prime */
unsigned hash_str(const char* s)
{
unsigned h = FIRSTH;
while (*s) {
h = (h * A) ^ (s[0] * B);
s++;
}
return h; // or return h % C;
}
But I don't claim to be an hash expert. Of course, the values of A
, B
, C
, FIRSTH
should preferably be primes, but you could have chosen other prime numbers.
Look at some MD5 implementation to get a feeling of what hash functions can be.
Most good books on algorithmics have at least a whole chapter dedicated to hashing. Start with wikipages on hash function & hash table.
What is a good Hash Function?
For doing "normal" hash table lookups on basically any kind of data - this one by Paul Hsieh is the best I've ever used.
http://www.azillionmonkeys.com/qed/hash.html
If you care about cryptographically secure or anything else more advanced, then YMMV. If you just want a kick ass general purpose hash function for a hash table lookup, then this is what you're looking for.
What is a good 64bit hash function in Java for textual strings?
Why don't you use a long
variant of the default String.hashCode()
(where some really smart guys certainly put effort into making it efficient - not mentioning the thousands of developer eyes that already looked at this code)?
// adapted from String.hashCode()
public static long hash(String string) {
long h = 1125899906842597L; // prime
int len = string.length();
for (int i = 0; i < len; i++) {
h = 31*h + string.charAt(i);
}
return h;
}
If you're looking for even more bits, you could probably use a BigInteger
Edit:
As I mentioned in a comment to the answer of @brianegge, there are not much usecases for hashes with more than 32 bits and most likely not a single one for hashes with more than 64 bits:
I could imagine a huge hashtable distributed across dozens of servers, maybe storing tens of billions of mappings. For such a scenario, @brianegge still has a valid point here: 32 bit allow for 2^32 (ca. 4.3 billion) different hash keys. Assuming a strong algorithm, you should still have quite few collisions. With 64 bit (18,446,744,073 billion different keys) your certainly save, regardless of whatever crazy scenario you need it for. Thinking of usecases for 128 bit keys (340,282,366,920,938,463,463,374,607,431 billion possible keys) is pretty much impossible though.
To combine the hash for several fields, simply do an XOR multiply one with a prime and add them:
long hash = MyHash.hash(string1) * 31 + MyHash.hash(string2);
The small prime is in there to avoid equal hash code for switched values, i.e. {'foo','bar'} and {'bar','foo'} aren't equal and should have a different hash code. XOR is bad as it returns 0 if both values are equal. Therefore, {'foo','foo'} and {'bar','bar'} would have the same hash code.
Related Topics
Post to Jersey Rest Service Getting Error 415 Unsupported Media Type
Hibernate: Different Object with the Same Identifier Value Was Already Associated with the Session
Compile Code Using Javafx 2.0 (Using Command Line)
Javax.Swing Timer Repeats Fine, But Actionlistener Doesn't Do Anything
How to Add an Actionlistener Onto a Jbutton in Java
Date and Time Conversion to Some Other Timezone in Java
How to Move My Jmenubar to the Screen Menu Bar on MAC Os X
Difference Between _Java_Options, Java_Tool_Options and Java_Opts
Integrating Tomcat and Eclipse as a Hot-Deploy Environment
Import Maven Dependencies in Intellij Idea
What Does Statement.Setfetchsize(Nsize) Method Really Do in SQL Server Jdbc Driver
Generate All Combinations from Multiple Lists
Parsing an Arithmetic Expression and Building a Tree from It in Java
Upload Files from Java Client to a Http Server
Should One Call .Close() on Httpservletresponse.Getoutputstream()/.Getwriter()