Dictionary = Hash

Hashing a dictionary?

If your dictionary is not nested, you could make a frozenset with the dict's items and use hash():

hash(frozenset(my_dict.items()))

This is much less computationally intensive than generating the JSON string or representation of the dictionary.

UPDATE: Please see the comments below, why this approach might not produce a stable result.

Is a Python dictionary an example of a hash table?

Yes, it is a hash mapping or hash table. You can read a description of python's dict implementation, as written by Tim Peters, here.

That's why you can't use something 'not hashable' as a dict key, like a list:

>>> a = {}
>>> b = ['some', 'list']
>>> hash(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> a[b] = 'some'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable

You can read more about hash tables or check how it has been implemented in python and why it is implemented that way.

What is the true difference between a dictionary and a hash table?

A dictionary is a general concept that maps keys to values. There are many ways to implement such a mapping.

A hashtable is a specific way to implement a dictionary.

Besides hashtables, another common way to implement dictionaries is red-black trees.

Each method has its own pros and cons. A red-black tree can always perform a lookup in O(log N). A hashtable can perform a lookup in O(1) time although that can degrade to O(N) depending on the input.

Alter the hash function of a dictionary

You can't change the hash-function - the dict will call hash on the keys it's supposed to insert, and that's that.

However, you can wrap the keys to provide different __hash__ and __eq__-Methods.

class MyHash(object):
def __init__(self, v):
self._v = v

def __hash__(self):
return hash(self._v) * -1

def __eq__(self, other):
return self._v == other._v

If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.

Python - hash() and dict

If the keys and values hash the same, frozenset is designed to be a stable and unique representation of the underlying values. The docs explicitly state:

Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other).

And the rules for hashable types require that:

Hashable objects which compare equal must have the same hash value.

So by definition frozensets with equal, hashable elements are equal and hash to the same value. This can only be violated if a user-defined class which does not obey the rules for hashing and equality is contained in the resulting frozenset (but then you've got bigger problems).

Note that this does not mean they'll iterate in the same order or produce the same repr; thanks to chaining on hash collisions, two frozensets constructed from the same elements in a different order need not iterate in the same order. But they're still equal to one another, and hash the same (precise outputs and ordering is implementation dependent, could easily vary between different versions of Python; this just happens to work on my Py 3.5 install to create the desired "different iteration order" behavior):

>>> frozenset([1,9])
frozenset({1, 9})
>>> frozenset([9,1])
frozenset({9, 1}) # <-- Different order; consequence of 8 buckets colliding for 1 and 9
>>> hash(frozenset([1,9]))
-7625378979602737914
>>> hash(frozenset([9,1]))
-7625378979602737914 # <-- Still the same hash though
>>> frozenset([1,9]) == frozenset([9,1])
True # <-- And still equal

Can Hash be sorted by keys or values?

Sorting by keys:

myhash = {5 => "five", 3 => "three", 2 => "two", 4 => "four", 1 => "one"}

myhash.keys.sort.each do |key|
print myhash[key], "\t:\t", key, "\n"
end

produces

one     :   1
two : 2
three : 3
four : 4
five : 5

Sorting by value is a bit more effort:

myhash.to_a.sort { |item1, item2| item1[1] <=> item2[1] }.each do |item|
puts item.join("\t:\t")
end

produces

5   :   five
4 : four
1 : one
3 : three
2 : two

If you want the outcomes in value:key order, change

puts item.join("\t:\t")

to

puts item.reverse.join("\t:\t")

What's the difference between a hash table and a Python dictionary?

There is not really any difference between them. That's why python's dicts don't support duplicates. That is also why python has the function hash which python's dictionaries use by default.

Dictionary = Hash?

The words table, dictionary and map are often used synonymously (in the context of data structures). A hash table/hash map is one kind of table/dictionary/map.

The {0} is a block (anonymous function) which ignores its argument and returns the number 0. The block given to Hash.new is called to produce a default value when a key is not found in the hash map.

I.e. if I do h = Hash.new {0} and then h["key that does not exist"], I get back 0, instead of nil (which I'd get without the {0}). Note that in this case where the default value is immutable and does not depend on the key, you don't need to use the block form of Hash.new, you can just do Hash.new(0) to set 0 as the default value.

What's a good hash function for English words?

To simply sum the letters is not a good strategy because a permutation gives the same result.

This one (djb2) is quite popular and works nicely with ASCII strings.

unsigned long hashstring(unsigned char *str)
{
unsigned long hash = 5381;
int c;

while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

return hash;
}

More info here.

If you need more alternatives and some perfomance measures, read here.

Added: These are general hashing functions, where the input domain is not known in advance (except perhaps some very general assumptions: eg the above works slightly better with ascii input), which is the most usual scenario. If you have a known restricted domain (set of inputs fixed) you can do better, see Fionn's answer.



Related Topics



Leave a reply



Submit