How do I combine hash values in C++0x?
Well, just do it like the boost guys did it:
template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
std::hash<T> hasher;
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
C++ - Why is boost::hash_combine the best way to combine hash-values?
It being the "best" is argumentative.
It being "good", or even "very good", at least superficially, is easy.
seed ^= hasher(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
We'll presume seed
is a previous result of hasher
or this algorithm.
^=
means that the bits on the left and bits on the right all change the bits of the result.
hasher(v)
is presumed to be a decent hash on v
. But the rest is defence in case it isn't a decent hash.
0x9e3779b9
is a 32 bit value (it could be extended to 64 bit if size_t
was 64 bit arguably) that contains half 0s and half 1s. It is basically a random series of 0s and 1s done by approximating particular irrational constant as a base-2 fixed point value. This helps ensure that if the hasher returns bad values, we still get a smear of 1s and 0s in our output.
(seed<<6) + (seed>>2)
is a bit shuffle of the incoming seed.
Imagine the 0x
constant was missing. Imagine the hasher returns the constant 0x01000
for almost every v
passed in. Now, each bit of the seed is spread out over the next iteration of the hash, during which it is again spread out.
The seed ^= (seed<<6) + (seed>>2)
0x00001000
becomes 0x00041400
after one iteration. Then 0x00859500
. As you repeat the operation, any set bits are "smeared out" over the output bits. Eventually the right and left bits collide, and carry moves the set bit from "even locations" to "odd locations".
The bits dependent on the value of an input seed grows relatively fast and in complex ways as the combine operation recurses on the seed operation. Adding causes carries, which smear things even more. The 0x
constant adds a bunch of pseudo-random bits that make boring hash values occupy more than a few bits of the hash space after being combined.
It is asymmetric thanks to addition (combining the hashes of "dog"
and "god"
gives different results), it handles boring hash values (mapping characters to their ascii value, which only involves twiddling a handful of bits). And, it is reasonably fast.
Slower hash combines that are cryptographically strong can be better in other situations. I, naively, would presume that making the shifts be a combination of even and odd shifts might be a good idea (but maybe addition, which moves even bits from odd bits, makes that less of a problem: after 3 iterations, incoming lone seed bits will collide and add and cause a carry).
The downside to this kind of analysis is that it only takes one mistake to make a hash function really bad. Pointing out all the good things doesn't help that much. So another thing that makes it good now is that it is reasonably famous and in an open-source repository, and I haven't heard anyone point out why it is bad.
Why is XOR the default way to combine hashes?
Assuming uniformly random (1-bit) inputs, the AND function output probability distribution is 75% 0
and 25% 1
. Conversely, OR is 25% 0
and 75% 1
.
The XOR function is 50% 0
and 50% 1
, therefore it is good for combining uniform probability distributions.
This can be seen by writing out truth tables:
a | b | a AND b
---+---+--------
0 | 0 | 0
0 | 1 | 0
1 | 0 | 0
1 | 1 | 1
a | b | a OR b
---+---+--------
0 | 0 | 0
0 | 1 | 1
1 | 0 | 1
1 | 1 | 1
a | b | a XOR b
---+---+--------
0 | 0 | 0
0 | 1 | 1
1 | 0 | 1
1 | 1 | 0
Exercise: How many logical functions of two 1-bit inputs a
and b
have this uniform output distribution? Why is XOR the most suitable for the purpose stated in your question?
Hashing a string and an int together?
boost::hash_combine
is an easy way to create hashes: even if you can't use the Boost, the function is quite simple, and so it's trivial to copy the implementation.
Usage sample:
struct intStringHash
{
public:
std::size_t operator()(const std::pair<int, std::string>& c) const
{
std::size_t hash = 0;
hash_combine(hash, c.first);
hash_combine(hash, c.second);
return hash;
}
};
How to hash an unordered_map?
The problem here is that there is no guarantee that the items even have an ordering among them.
So, sorting the items may very well not work for arbitrary unordered containers. You have 2 options:
- Just XOR the hashes of all the individual elements. This is the fastest.
- First sort the hashes of the containers, and then hash those. This may result in a better hash.
Almost the same Hash value
This should do it:
DECLARE @HashThis varchar(32);
SET @HashThis = CONVERT(varchar(32),'Hello World!');
SELECT CONVERT(char(64), HASHBYTES('SHA2_256', @HashThis), 2)
But if you send down a binary to the database you should be able to compare that to the result of HASHBYTES without any convert. The same if you bring it up to you C# app as byte[].
Related Topics
Why Can't I Use Float Value as a Template Parameter
Reason to Pass a Pointer by Reference in C++
Detecting Superfluous #Includes in C/C++
Is There a Limit on Number of Open Files in Windows
Why Doesn't Adl Find Function Templates
Why Is Address Zero Used For the Null Pointer
C++ Code File Extension? Difference Between .Cc and .Cpp
"\N" or '\N' or Std::Endl to Std::Cout
How to Make a .Lib File When Have a .Dll File and a Header File
Why Must the Copy Assignment Operator Return a Reference/Const Reference
Global Memory Management in C++ in Stack or Heap
How to Reassign the Reference in C++
Catching Access Violation Exceptions