How to Implement an Efficient Bidirectional Hash Table

Best implementation of dual-side dict in Python

You could use bidict package which provides a bidirectional map. The syntax looks as follows (taken from the documentation):

>>> from bidict import bidict
>>> element_by_symbol = bidict(H='hydrogen')
>>> element_by_symbol
bidict({'H': 'hydrogen'})
>>> element_by_symbol['H']
'hydrogen'

>>> element_by_symbol.inv
bidict({'hydrogen': 'H'})
>>> element_by_symbol.inv['hydrogen']
'H'
>>> element_by_symbol.inv.inv is element_by_symbol
True

Or you can implement it yourself, for example using one of the solutions provided here.

Is there a more efficient implementation for a bidirectional map?

There is a certain problem with double-storing your data in all simple implementations of a bimap. If you can break it down to a bimap of pointers from outside, then you can readily ignore this and simply keep both maps of the form std::map<A*,B*> like Arkaitz Jimenez already suggested (though contrary to his answer you have to care about the storage from outside to avoid a A->A* lookup). But if you have the pointers anyway, why not simply store a std::pair<A,B> at the point where you would otherwise store A and B separately?

It would be nice to have std::map<A,B*> instead of std::map<A*,B*> as this would allow for example the lookup of an element associated to an string by a newly created string with the same content instead of the pointer to the original string that created the pair. But it is customary to store a full copy of the key with every entry and only rely on the hash to find the right bucket. This way the returned item will be the correct one even in the case of a hash-collision...

If you want to have it quick and dirty though, there is this

hackish solution:

Create two maps std::map<size_t, A> mapA and std::map<size_t, B> mapB. Upon insertion hash both elements that are to be inserted to get the keys to the respective maps.

void insert(const A &a, const B &b) {
size_t hashA = std::hash<A>(a);
size_t hashB = std::hash<B>(b);

mapA.insert({hashB, a});
mapB.insert({hashA, b});
}

Lookup is implemented analogously.

Using a multimap instead of a map and verifying every element you get with a lookup in the respectively other map (get candidate b from mapA, hash b and look in mapB if it matches the wanted key, iterate to the next candidate b otherwise) this is a valid implementation - but still hackish in my opinion...

You can get a much nicer solution by using the copies of the elements that are used to compare the entries (see above) as only storage. It is a bit harder to get your head around that though. To elaborate:

a nicer solution:

Create two sets of pairs as std::set<pair<A, B*>> and std::set<pair<B, A*>> and overload the operator< and operator== to only take the first element of the pairs into account (or provide an corresponding comparion class). It is necessary to create sets of pairs instead of maps (which internally look similarly) because we need a guarantee that A and B will be at constant positions in memory. Upon insertion of an pair<A, B> we split it into two elements that fit into the above sets.

  std::set<pair<B, A*>> mapA;
std::set<pair<A, B*>> mapB;

void insert(const A &a, const B &b) {
auto aitr = mapA.insert({b, nullptr}).first; // creates first pair
B *bp = &(aitr->first); // get pointer of our stored copy of b
auto bitr = mapB.insert({a, bp}).first;
// insert second pair {a, pointer_to_b}
A *ap = &(bitr->first); // update pointer in mapA to point to a
aitr->second = ap;
}

Lookup can now simply be done by a simple std::set lookup and a pointer dereference.

This nicer solution is similar to the solution that boost uses - even though they use some annonymized pointers as second elements of the pairs and thus have to use reinterpret_casts.

Note that the .second part of the pairs need to be mutable (so I'm not sure std::pair can be used), or you have to add another layer of abstraction (std::set<pair<B, A**>> mapA) even for this simple insertion. In both solutions you need temporary elements to return non-const references to elements.

Two way/reverse map

You can create your own dictionary type by subclassing dict and adding the logic that you want. Here's a basic example:

class TwoWayDict(dict):
def __setitem__(self, key, value):
# Remove any previous connections with these values
if key in self:
del self[key]
if value in self:
del self[value]
dict.__setitem__(self, key, value)
dict.__setitem__(self, value, key)

def __delitem__(self, key):
dict.__delitem__(self, self[key])
dict.__delitem__(self, key)

def __len__(self):
"""Returns the number of connections"""
return dict.__len__(self) // 2

And it works like so:

>>> d = TwoWayDict()
>>> d['foo'] = 'bar'
>>> d['foo']
'bar'
>>> d['bar']
'foo'
>>> len(d)
1
>>> del d['foo']
>>> d['bar']
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
KeyError: 'bar'

I'm sure I didn't cover all the cases, but that should get you started.

Two-way hash table

One name for this is a BiMap (as in bidirectional). The obvious limitation is that keys will be distinct (like in a normal dictionary/map), but so will with values.

For Java, there's a StackOverflow question on it, but the general recommendation is the Guava BiMap.

For C and C++, Boost has a Bimap.

Internally, it's the "inefficient" implementation you mention where it keeps two hashtables. Here's the thing: it is efficient, and using twice as much memory for a secondary lookup structure is expected, and rarely a big deal.

Python bidirectional mapping

If you want to use two dicts, you can try this to create the inverted dict:

b = {v: k for k, v in a.iteritems()}

build a dictionary for find key by value

You could try bidict:

>>> husbands2wives = bidict({'john': 'jackie'})
>>> husbands2wives['john'] # the forward mapping is just like with dict
'jackie'
>>> husbands2wives[:'jackie'] # use slice for the inverse mapping
'john'

Is there a better way to store a twoway dictionary than storing its inverse separate?

What I've done in the past is created a reversedict function, which would take a dict and return the opposite mapping, either values to keys if I knew it was one-to-one (throwing exceptions on seeing the same value twice), or values to lists of keys if it wasn't. That way, instead of having to construct two dicts at the same time each time I wanted the inverse look-up, I could create my dicts as normal and just call the generic reversedict function at the end.

However, it seems that the bidict solution that Jon mentioned in the comments is probably the better one. (My reversedict function seems to be his bidict's ~ operator).



Related Topics



Leave a reply



Submit