Hashing a dictionary?
If your dictionary is not nested, you could make a frozenset with the dict's items and use hash()
:
hash(frozenset(my_dict.items()))
This is much less computationally intensive than generating the JSON string or representation of the dictionary.
UPDATE: Please see the comments below, why this approach might not produce a stable result.
Is a Python dictionary an example of a hash table?
Yes, it is a hash mapping or hash table. You can read a description of python's dict implementation, as written by Tim Peters, here.
That's why you can't use something 'not hashable' as a dict key, like a list:
>>> a = {}
>>> b = ['some', 'list']
>>> hash(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> a[b] = 'some'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
You can read more about hash tables or check how it has been implemented in python and why it is implemented that way.
What is the true difference between a dictionary and a hash table?
A dictionary is a general concept that maps keys to values. There are many ways to implement such a mapping.
A hashtable is a specific way to implement a dictionary.
Besides hashtables, another common way to implement dictionaries is red-black trees.
Each method has its own pros and cons. A red-black tree can always perform a lookup in O(log N). A hashtable can perform a lookup in O(1) time although that can degrade to O(N) depending on the input.
Alter the hash function of a dictionary
You can't change the hash-function - the dict will call hash
on the keys it's supposed to insert, and that's that.
However, you can wrap the keys to provide different __hash__
and __eq__
-Methods.
class MyHash(object):
def __init__(self, v):
self._v = v
def __hash__(self):
return hash(self._v) * -1
def __eq__(self, other):
return self._v == other._v
If this actually helps anything with your original problem/question I doubt though, it seems rather a custom array/list-based data-structure might be the answer. Or not.
Python - hash() and dict
If the keys and values hash
the same, frozenset
is designed to be a stable and unique representation of the underlying values. The docs explicitly state:
Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other).
And the rules for hashable types require that:
Hashable objects which compare equal must have the same hash value.
So by definition frozenset
s with equal, hashable elements are equal and hash to the same value. This can only be violated if a user-defined class which does not obey the rules for hashing and equality is contained in the resulting frozenset
(but then you've got bigger problems).
Note that this does not mean they'll iterate in the same order or produce the same repr
; thanks to chaining on hash collisions, two frozenset
s constructed from the same elements in a different order need not iterate in the same order. But they're still equal to one another, and hash the same (precise outputs and ordering is implementation dependent, could easily vary between different versions of Python; this just happens to work on my Py 3.5 install to create the desired "different iteration order" behavior):
>>> frozenset([1,9])
frozenset({1, 9})
>>> frozenset([9,1])
frozenset({9, 1}) # <-- Different order; consequence of 8 buckets colliding for 1 and 9
>>> hash(frozenset([1,9]))
-7625378979602737914
>>> hash(frozenset([9,1]))
-7625378979602737914 # <-- Still the same hash though
>>> frozenset([1,9]) == frozenset([9,1])
True # <-- And still equal
Can Hash be sorted by keys or values?
Sorting by keys:
myhash = {5 => "five", 3 => "three", 2 => "two", 4 => "four", 1 => "one"}
myhash.keys.sort.each do |key|
print myhash[key], "\t:\t", key, "\n"
end
produces
one : 1
two : 2
three : 3
four : 4
five : 5
Sorting by value is a bit more effort:
myhash.to_a.sort { |item1, item2| item1[1] <=> item2[1] }.each do |item|
puts item.join("\t:\t")
end
produces
5 : five
4 : four
1 : one
3 : three
2 : two
If you want the outcomes in value:key order, change
puts item.join("\t:\t")
to
puts item.reverse.join("\t:\t")
What's the difference between a hash table and a Python dictionary?
There is not really any difference between them. That's why python's dicts don't support duplicates. That is also why python has the function hash
which python's dictionaries use by default.
Dictionary = Hash?
The words table, dictionary and map are often used synonymously (in the context of data structures). A hash table/hash map is one kind of table/dictionary/map.
The {0}
is a block (anonymous function) which ignores its argument and returns the number 0. The block given to Hash.new
is called to produce a default value when a key is not found in the hash map.
I.e. if I do h = Hash.new {0}
and then h["key that does not exist"]
, I get back 0, instead of nil
(which I'd get without the {0}
). Note that in this case where the default value is immutable and does not depend on the key, you don't need to use the block form of Hash.new
, you can just do Hash.new(0)
to set 0 as the default value.
What's a good hash function for English words?
To simply sum the letters is not a good strategy because a permutation gives the same result.
This one (djb2) is quite popular and works nicely with ASCII strings.
unsigned long hashstring(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
More info here.
If you need more alternatives and some perfomance measures, read here.
Added: These are general hashing functions, where the input domain is not known in advance (except perhaps some very general assumptions: eg the above works slightly better with ascii input), which is the most usual scenario. If you have a known restricted domain (set of inputs fixed) you can do better, see Fionn's answer.
Related Topics
Memory Size of a Hash or Other Object
Active Record with Delegate and Conditions
How to Get Generators Call Other Generators in Rails 3
Implementing Bayesian Classifier in Ruby
Need Advice: Is This a Good Use Case for a 'Nosql' Database? If So, Which One
Ruby on Rails: Conditionally Display a Partial
How to Properly Chain Custom Methods in Ruby
Rvm VS Native Installation of Ruby
How to Pluck Email from Array of Users
Rails Activerecord Query Date Range
Trying to Understand Use of Self.Method_Name VS. Classname.Method_Name in Ruby
Can't Install Rmagick Gem on Ubuntu 13.04
Rspec View Undefined Method Stub_Model
Machinist VS Factorygirl - Pros and Cons