Fastest hash for non-cryptographic uses?
CRC32 is pretty fast and there's a function for it: http://www.php.net/manual/en/function.crc32.php
But you should be aware that CRC32 will have more collisions than MD5 or even SHA-1 hashes, simply because of the reduced length (32 bits compared to 128 bits respectively 160 bits). But if you just want to check whether a stored string is corrupted, you'll be fine with CRC32.
Very low collision non-cryptographic hashing function
Yann Collet's xxHash may be a good choice (Home page, GitHub)
xxHash is an extremely fast non-cryptographic hash algorithm, working
at speeds close to RAM limits. It is proposed in two flavors, 32 and
64 bits.
At least 4 C# impelmentations are available (see home page).
I had excellent results with it in the past.
The Hash size is 32 or 64 bit, but XXH3 is in the making:
XXH3 features a wide internal state of 512 bits, which makes it
suitable to generate a hash of up to 256 bit. For the time being, only
64-bit and 128-bit variants are exposed, but a similar recipe can be
used for a 256-bit variant if there is any need for it one day. All
variant feature same speed, since only the finalization stage is
different.
In general, the longer the hash, the slower its calculation. 64-bit hash is good enough for most practical purposes.
You can generate longer hashes by combining two hash functions (e.g. 128-bit XXH3 and 128-bit MurmurHash3).
fast, large-width, non-cryptographic string hashing in python
Take a look at the 128-bit variant of MurmurHash3. The algorithm's page includes some performance numbers. Should be possible to port this to Python, pure or as a C extension. (Updated the author recommends using the 128-bit variant and throwing away the bits you don't need).
If MurmurHash2 64-bit works for you, there is a Python implementation (C extension) in the pyfasthash package, which includes a few other non-cryptographic hash variants, though some of these only offer 32-bit output.
Update I did a quick Python wrapper for the Murmur3 hash function. Github project is here and you can find it on Python Package Index as well; it just needs a C++ compiler to build; no Boost required.
Usage example and timing comparison:
import murmur3
import timeit
# without seed
print murmur3.murmur3_x86_64('samplebias')
# with seed value
print murmur3.murmur3_x86_64('samplebias', 123)
# timing comparison with str __hash__
t = timeit.Timer("murmur3.murmur3_x86_64('hello')", "import murmur3")
print 'murmur3:', t.timeit()
t = timeit.Timer("str.__hash__('hello')")
print 'str.__hash__:', t.timeit()
Output:
15662901497824584782
7997834649920664675
murmur3: 0.264422178268
str.__hash__: 0.219163894653
Searching for a Fast Hash Algorithm
Java's String
class already implements .hashCode()
. This is likely going to be the fastest, 32bit hash, for Java, as its heavily optimized at the core. This is also the hash in use when using the built-in collections, such as java.util.HashMap
.
What is the fastest hash algorithm to check if two files are equal?
One approach might be to use a simple CRC-32 algorithm, and only if the CRC values compare equal, rerun the hash with a SHA1 or something more robust. A fast CRC-32 will outperform a cryptographically secure hash any day.
Non-cryptographic hash functions that are homomorphic with respect to concatenation
The probability of one collision among all pairs of your 232 64-bit CRCs is about 1/2. If that's too high for you, you can use a 128-bit CRC. That drops the probability of one collision to 3x10-20.
Related Topics
Woocommerce: Display Some Reviews Randomly on Home Page
Split Array into Two Arrays by Index Even or Odd
Trying to Access Array Offset on Value of Type Null
How to Bind SQL Variables in PHP
Running a Zend Framework Action from Command Line
Relative Path in Require_Once Doesn't Work
Curl Error 60: Ssl Certificate in Laravel 5.4
What Are the Valid Characters in PHP Variable, Method, Class, etc Names
PHP Multidimensional Array Get Values
Make a Path Work Both on Linux and Windows
Performance in Pdo/Php/Mysql: Transaction Versus Direct Execution
PHP Foreach by Reference Causes Weird Glitch When Going Through Array of Objects
PHP Regex Word Boundary Matching in Utf-8
How to Return a Proper Success/Error Message for Jquery .Ajax() Using PHP