Hash an Arbitrary Precision Value (Boost::Multiprecision::Cpp_Int)

Hash an arbitrary precision value (boost::multiprecision::cpp_int)

You can (ab)use the serialization support:

Support for serialization comes in two forms:
Classes number, debug_adaptor, logged_adaptor and rational_adaptor have "pass through" serialization support which requires the underlying backend to be serializable.

Backends cpp_int, cpp_bin_float, cpp_dec_float and float128 have full support for Boost.Serialization.

So, let me cobble something together that works with boost and std unordered containers:

template <typename Map>
void test(Map const& map) {
std::cout << "\n" << __PRETTY_FUNCTION__ << "\n";
for(auto& p : map)
std::cout << p.second << "\t" << p.first << "\n";
}

int main() {
using boost::multiprecision::cpp_int;

test(std::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});

test(boost::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});
}

Let's forward the relevant hash<> implementations to our own hash_impl specialization that uses Multiprecision and Serialization:

namespace std {
template <typename backend>
struct hash<boost::multiprecision::number<backend> >
: mp_hashing::hash_impl<boost::multiprecision::number<backend> >
{};
}

namespace boost {
template <typename backend>
struct hash<multiprecision::number<backend> >
: mp_hashing::hash_impl<multiprecision::number<backend> >
{};
}

Now, of course, this begs the question, how is hash_impl implemented?

template <typename T> struct hash_impl {
size_t operator()(T const& v) const {
using namespace boost;
size_t seed = 0;
{
iostreams::stream<hash_sink> os(seed);
archive::binary_oarchive oa(os, archive::no_header | archive::no_codecvt);
oa << v;
}
return seed;
}
};

This looks pretty simple. That's because Boost is awesome, and writing a hash_sink device for use with Boost Iostreams is just the following straightforward exercise:

namespace io = boost::iostreams;

struct hash_sink {
hash_sink(size_t& seed_ref) : _ptr(&seed_ref) {}

typedef char char_type;
typedef io::sink_tag category;

std::streamsize write(const char* s, std::streamsize n) {
boost::hash_combine(*_ptr, boost::hash_range(s, s+n));
return n;
}
private:
size_t* _ptr;
};

Full Demo:

Live On Coliru

#include <iostream>
#include <iomanip>

#include <boost/archive/binary_oarchive.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/cpp_int/serialize.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/stream_buffer.hpp>
#include <boost/iostreams/stream.hpp>

#include <boost/functional/hash.hpp>

namespace mp_hashing {
namespace io = boost::iostreams;

struct hash_sink {
hash_sink(size_t& seed_ref) : _ptr(&seed_ref) {}

typedef char char_type;
typedef io::sink_tag category;

std::streamsize write(const char* s, std::streamsize n) {
boost::hash_combine(*_ptr, boost::hash_range(s, s+n));
return n;
}
private:
size_t* _ptr;
};

template <typename T> struct hash_impl {
size_t operator()(T const& v) const {
using namespace boost;
size_t seed = 0;
{
iostreams::stream<hash_sink> os(seed);
archive::binary_oarchive oa(os, archive::no_header | archive::no_codecvt);
oa << v;
}
return seed;
}
};
}

#include <unordered_map>
#include <boost/unordered_map.hpp>

namespace std {
template <typename backend>
struct hash<boost::multiprecision::number<backend> >
: mp_hashing::hash_impl<boost::multiprecision::number<backend> >
{};
}

namespace boost {
template <typename backend>
struct hash<multiprecision::number<backend> >
: mp_hashing::hash_impl<multiprecision::number<backend> >
{};
}

template <typename Map>
void test(Map const& map) {
std::cout << "\n" << __PRETTY_FUNCTION__ << "\n";
for(auto& p : map)
std::cout << p.second << "\t" << p.first << "\n";
}

int main() {
using boost::multiprecision::cpp_int;

test(std::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});

test(boost::unordered_map<cpp_int, std::string> {
{ cpp_int(1) << 111, "one" },
{ cpp_int(2) << 222, "two" },
{ cpp_int(3) << 333, "three" },
});
}

Prints

void test(const Map&) [with Map = std::unordered_map<boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<> >, std::basic_string<char> >]
one 2596148429267413814265248164610048
three 52494017394792286184940053450822912768476066341437098474218494553838871980785022157364316248553291776
two 13479973333575319897333507543509815336818572211270286240551805124608

void test(const Map&) [with Map = boost::unordered::unordered_map<boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<> >, std::basic_string<char> >]
three 52494017394792286184940053450822912768476066341437098474218494553838871980785022157364316248553291776
two 13479973333575319897333507543509815336818572211270286240551805124608
one 2596148429267413814265248164610048

As you can see, the difference in implementation between Boost's and the standard library's unordered_map show up in the different orderings for identical hashes.

Boost, converting between arbitrary precision floating point and integral types

The usual culprit is that the result of BMP expressions are not their target types, but still "lazy" expression templates, and they don't have all the implicit conversions. So, you need to evaluate these "manually" (explicitly):

return Dec(ceil(five*base*mult)).convert_to<Int>();

See it Live On Coliru

#include <iostream>
#include <boost/multiprecision/detail/default_ops.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <boost/math/special_functions/pow.hpp>
#include <boost/multiprecision/number.hpp>

namespace mp = boost::multiprecision;

using Int = mp::cpp_int;

Int calcReward(Int baseReward, int percent);

int main()
{
Int const immBase = 400;
int const percent = 12;

std::cout << calcReward(immBase, percent);
}

Int calcReward(Int baseReward, int percent)
{
using Dec = mp::cpp_dec_float_50;
Dec mult = 0.1+0.01*percent;
Dec five = 5;
Dec base = baseReward.convert_to<Dec>();

return Dec(ceil(five*base*mult)).convert_to<Int>();
}

Notes:

  • alternatively, disable expression templates

    using Int = mp::number<mp::cpp_int::backend_type, mp::et_off>;
    using Dec = mp::number<mp::cpp_dec_float<50>, mp::et_off>;
  • the chrono include(s) were irrelevant

  • ceil can be used unqualified due to ADL (What is "Argument-Dependent Lookup" (aka ADL, or "Koenig Lookup")?)

C++: Using pair of (cpp_int, int) integers as key in unordered_map (where cpp_int are boost multiprecision integers)

Yes, you took the Multiprecision hash I contributed earlier and added the hash for std::pair. I'm not a fan of handrolling the hash combination (good general hash combination is not trivial).

So I'd do the same with boost::hash_combine:

template <typename K, typename V>
struct hash<std::pair<K, V> >
{
size_t operator()(std::pair<K, V> const& pair) const {
size_t seed = std::hash<K>{}(pair.first);
boost::hash_combine(seed, pair.second);
return seed;
}
};

Live On MSVC RexTester

#include <iostream>
#include <iomanip>

#include <boost/archive/binary_oarchive.hpp>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/cpp_int/serialize.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/stream_buffer.hpp>
#include <boost/iostreams/stream.hpp>

#include <boost/functional/hash.hpp>

namespace mp_hashing {
namespace io = boost::iostreams;

struct hash_sink {
hash_sink(size_t& seed_ref) : _ptr(&seed_ref) {}

typedef char char_type;
typedef io::sink_tag category;

std::streamsize write(const char* s, std::streamsize n) {
boost::hash_combine(*_ptr, boost::hash_range(s, s+n));
return n;
}
private:
size_t* _ptr;
};

template <typename T> struct hash_impl {
size_t operator()(T const& v) const {
using namespace boost;
size_t seed = 0;
{
iostreams::stream<hash_sink> os(seed);
archive::binary_oarchive oa(os, archive::no_header | archive::no_codecvt);
oa << v;
}
return seed;
}
};
}

#include <unordered_map>
#include <boost/unordered_map.hpp>

namespace std {
template <typename backend>
struct hash<boost::multiprecision::number<backend> >
: mp_hashing::hash_impl<boost::multiprecision::number<backend> >
{};

template <typename K, typename V>
struct hash<std::pair<K, V> >
{
size_t operator()(std::pair<K, V> const& pair) const {
size_t seed = std::hash<K>{}(pair.first);
boost::hash_combine(seed, pair.second);
return seed;
}
};
}

int main() {
using boost::multiprecision::cpp_int;

std::unordered_map<std::pair<cpp_int, int>, int> m {
{ { cpp_int(1) << 111, -1 }, 1 },
{ { cpp_int(2) << 222, -2 }, 2 },
{ { cpp_int(3) << 333, -3 }, 3 },
};

for (auto& p : m)
std::cout << p.first.first << " -> " << p.second << "\n";
}

Why I cannot use neither std::unordered_map nor boost::unordered_map with boost::multiprecision types?

Your "question of which this might be a duplicate" documents a working technique in the question itself - hashing the string representation:

#include <boost/multiprecision/cpp_int.hpp>
#include <boost/unordered_map.hpp>

using namespace boost::multiprecision;

template <typename T>
struct hash_str
{
size_t operator()(const T& t) const { return std::hash<std::string>()(t.str()); }
};

int main()
{
cpp_int z(123123123);
cpp_int x(123123123);

boost::unordered_map<cpp_int, cpp_int, hash_str<cpp_int>> data;

data.insert(std::make_pair(z,x));
}

Notes:

  • I don't know if cpp_int::str() outputs the full precision the type stores, but if it doesn't then distinct values yielding the same str() and hence hash will collide in the same bucket in the hash table, which won't break the functionality but moves away from O(1) towards O(N) performance. So, if by default str() doesn't display full precision but there's a way to force it, that would be a good idea if you deal in lots of values differing very slightly.

  • As with all uses of floating point types as keys, be careful as tiny rounding differences can lead to existing map entries not being found/matched, and hence unintended "duplicates" etc..

It your program is too slow and the profile proves the hashing is the cause, then worry about alternatives or upgrading boost....

Generate hash for boost::dynamic_bitset and convert hash back to boost::dynamic_bitset

Ok, I'm detecting much confusion about the essence of "hashing", so a few friendly pointers to get started:

Q. 2. How to convert the hash back to orignial bitset.

That's impossible. A hash is lossy digest. You can only do this if the hash is a Perfect Hash which, due to the laws of entropy, cannot happen if the bitset capacity exceeds the size of size_t on your platform (typically 32 or 64 bits).

Q. I also tried creating a hash by ...

 std::bitset<142> seq (str);
....

I hope you realize that std::bitset<> is an entirely different type, so it's not really related to the task. And, since it's not dynamic, it's rather unhelpful for the task, even as a workaround.

But Most Importantly:

Hashes are used by hash-tables (like unordered_*<>) but they are not stored. Hashes are lossy digests, only used to get a good distribution over the internal buckets¹. For actual element equality, std::equal<T> is still used.

In other words:

typedef boost::bimap<bimaps::unordered_set_of<unsigned long long int>,
bimaps::unordered_multiset_of<size_t> > bimap_reference;

is unsuited for creating a map of anything other than size_t or unsigned long long². If you store hashes of things there:

reference_index_vector.insert(position(10000000000, hash1));

, you lose the original information. There's no way to get the bitset from hash1.

The compiler error

Your hash_value implementation mistakenly uses private members of dynamic_bitset<>. You can't because it's not accessible.

Here's a simple implementation of std::hash<> using the public interface:

Live On Coliru

#include <boost/dynamic_bitset.hpp>
#include <boost/functional/hash.hpp>
#include <unordered_map>
#include <sstream>

namespace std {

template <typename Block, typename Alloc> struct hash<boost::dynamic_bitset<Block, Alloc> > {
size_t operator()(boost::dynamic_bitset<Block, Alloc> const& bs) const {
size_t seed = boost::hash_value(bs.size());

std::vector<Block> blocks(bs.num_blocks());
boost::hash_range(seed, blocks.begin(), blocks.end());

return seed;
}
};

}

int main() {
boost::dynamic_bitset<> x, y;
x.resize(rand()%100, 1);
y.resize(rand()%100, 0);

std::unordered_map<boost::dynamic_bitset<>, std::string> m;
m[x] = "x";
m[y] = "y";
}

You can use this std::hash<> specialization instead and use boost::bimap with it.

NOTE, that using the public interface is not optimal, because it copies the Blocks (you also did that with the std::bitset<> hack). You might be interested in the Boost Serialization implementation I did for boost::dynamic_bitset<> before:

  • How to serialize boost::dynamic_bitset?
  • And here's code to show how to use the Serialization implementation to implement the hash efficiently Hash an arbitrary precision value (boost::multiprecision::cpp_int)

¹ Assuming, for simplicity, buckets instead of "open addressing" style. The same logic applies there, but somewhat more complicated

² (by the way, please just say uintmax_t or uint64_t)

C++ - Hash value of object graph similar to boost::serialization

I did exactly that once:

  • Hash an arbitrary precision value (boost::multiprecision::cpp_int)

Also required reading: Types Don't Know # with the talk on YT: https://www.youtube.com/watch?v=Njjp_MJsgt8



Related Topics



Leave a reply



Submit