Fastest Hash for Non-Cryptographic Uses

Fastest hash for non-cryptographic uses?

CRC32 is pretty fast and there's a function for it: http://www.php.net/manual/en/function.crc32.php

But you should be aware that CRC32 will have more collisions than MD5 or even SHA-1 hashes, simply because of the reduced length (32 bits compared to 128 bits respectively 160 bits). But if you just want to check whether a stored string is corrupted, you'll be fine with CRC32.

Very low collision non-cryptographic hashing function

Yann Collet's xxHash may be a good choice (Home page, GitHub)

xxHash is an extremely fast non-cryptographic hash algorithm, working
at speeds close to RAM limits. It is proposed in two flavors, 32 and
64 bits.

At least 4 C# impelmentations are available (see home page).

I had excellent results with it in the past.

The Hash size is 32 or 64 bit, but XXH3 is in the making:

XXH3 features a wide internal state of 512 bits, which makes it
suitable to generate a hash of up to 256 bit. For the time being, only
64-bit and 128-bit variants are exposed, but a similar recipe can be
used for a 256-bit variant if there is any need for it one day. All
variant feature same speed, since only the finalization stage is
different.

In general, the longer the hash, the slower its calculation. 64-bit hash is good enough for most practical purposes.

You can generate longer hashes by combining two hash functions (e.g. 128-bit XXH3 and 128-bit MurmurHash3).

fast, large-width, non-cryptographic string hashing in python

Take a look at the 128-bit variant of MurmurHash3. The algorithm's page includes some performance numbers. Should be possible to port this to Python, pure or as a C extension. (Updated the author recommends using the 128-bit variant and throwing away the bits you don't need).

If MurmurHash2 64-bit works for you, there is a Python implementation (C extension) in the pyfasthash package, which includes a few other non-cryptographic hash variants, though some of these only offer 32-bit output.

Update I did a quick Python wrapper for the Murmur3 hash function. Github project is here and you can find it on Python Package Index as well; it just needs a C++ compiler to build; no Boost required.

Usage example and timing comparison:

import murmur3
import timeit

# without seed
print murmur3.murmur3_x86_64('samplebias')
# with seed value
print murmur3.murmur3_x86_64('samplebias', 123)

# timing comparison with str __hash__
t = timeit.Timer("murmur3.murmur3_x86_64('hello')", "import murmur3")
print 'murmur3:', t.timeit()

t = timeit.Timer("str.__hash__('hello')")
print 'str.__hash__:', t.timeit()

Output:

15662901497824584782
7997834649920664675
murmur3: 0.264422178268
str.__hash__: 0.219163894653

Searching for a Fast Hash Algorithm

Java's String class already implements .hashCode(). This is likely going to be the fastest, 32bit hash, for Java, as its heavily optimized at the core. This is also the hash in use when using the built-in collections, such as java.util.HashMap.

What is the fastest hash algorithm to check if two files are equal?

One approach might be to use a simple CRC-32 algorithm, and only if the CRC values compare equal, rerun the hash with a SHA1 or something more robust. A fast CRC-32 will outperform a cryptographically secure hash any day.

Non-cryptographic hash functions that are homomorphic with respect to concatenation

The probability of one collision among all pairs of your 232 64-bit CRCs is about 1/2. If that's too high for you, you can use a 128-bit CRC. That drops the probability of one collision to 3x10-20.



Related Topics



Leave a reply



Submit