Why can a Python dict have multiple keys with the same hash?
For a detailed description of how Python's hashing works see my answer to Why is early return slower than else?
Basically it uses the hash to pick a slot in the table. If there is a value in the slot and the hash matches, it compares the items to see if they are equal.
If the hash matches but the items aren't equal, then it tries another slot. There's a formula to pick this (which I describe in the referenced answer), and it gradually pulls in unused parts of the hash value; but once it has used them all up, it will eventually work its way through all slots in the hash table. That guarantees eventually we either find a matching item or an empty slot. When the search finds an empty slot, it inserts the value or gives up (depending whether we are adding or getting a value).
The important thing to note is that there are no lists or buckets: there is just a hash table with a particular number of slots, and each hash is used to generate a sequence of candidate slots.
Putting two keys with the same hash into a dict
As the accepted answer mentioned by @CoryKramer states, equality of hashes does not imply equality of objects. Python dictionaries can contain any number of elements with equal hashes as long as the objects themselves are not equal.
The short answer to your question is probably that the implementation of the complex
type is a bit incomplete in the Python library as of 2.7. As @wim points out, comparing int
and complex
using ==
works fine, but comparing Decimal
and complex
does not. Since comparing one_decimal == one_complex
will always return False
because of their types, they can both live in the same dictionary in Python 2.7.
This issue has been fixed in Python 3. I am experimenting in 3.5, where one_decimal
and one_complex
are equal. After running the same snippet, the dictionary contains the value for one_complex
under the key one_decimal
, as expected (first key, last value).
TL;DR
It's a bug in Py2.7's complex
type. Fixed in Py3.
Is it possible to assign the same value to multiple keys in a dict object at once?
I would say what you have is very simple, you could slightly improve it to be:
my_dict = dict.fromkeys(['a', 'c', 'd'], 10)
my_dict.update(dict.fromkeys(['b', 'e'], 20))
If your keys are tuple you could do:
>>> my_dict = {('a', 'c', 'd'): 10, ('b', 'e'): 20}
>>> next(v for k, v in my_dict.items() if 'c' in k) # use .iteritems() python-2.x
10
This is, of course, will return first encountered value, key for which contains given element.
Python dictionary with multiple keys pointing to same list in memory efficient way
You are really in a trade-off space between the time/memory it takes to generate the dictionary versus the time it takes to scan the entire data for an on-the-fly method.
If you want a low memory method, you can use a function that searches each sublist for the value. Using a generator will get initial results faster to the user, but for large data sets, this will be slow between returns.
data = [[
"A 5408599",
"B 8126880",
"A 2003529",
],
[
"C 9925336",
"C 3705674",
"A 823678571",
"C 3205170186",
],
[
"C 9772980",
"B 8960327",
"C 4185139021",
"D 1226285245",
"C 2523866271",
"D 2940954504",
"D 5083193",
]]
def find_list_by_value(v, data):
for sublist in data:
if v in sublist:
yield sublist
for s in find_list_by_value("C 9772980", data):
print(s)
As mentioned in the comments, building a hash table based just on the first letter or first 2 or 3 character might be a good place to start. This will allow you to build a candidate list of sublists, then scan those to see if the value is in the sublist.
from collections import defaultdict
def get_key(v, size=3):
return v[:size]
def get_keys(sublist, size=3):
return set(get_key(v, size) for v in sublist)
def find_list_by_hash(v, data, hash_table, size=3):
key = get_key(v, size)
candidate_indices = hash_table.get(key, set())
for ix in candidates:
if v in data[ix]:
yield data[ix]
# generate the small hash table
quick_hash = defaultdict(set)
for i, sublist in enumerate(data):
for k in get_keys(sublist, 3):
quick_hash[k].add(i)
# lookup a value by the small hash
for s in find_list_by_hash("C 9772980", data, quick_hash, 3):
print(s)
In this code quick_hash
will take some time to build, because you are scanning your entire data structure. However, the memory foot print will be much smaller. You main parameter for tuning performance is size
. Smaller size will have a smaller memory footprint, but will take longer when running find_list_by_hash
because your candidate pool will be larger. You can do some testing to see what the right size
should be for your data. Just be mindful that all of your values are at least as long as size
.
How to handle multiple keys for a dictionary in python?
defaultdict
/dict.setdefault
Let's jump into it:
- Iterate over items consecutively
- Append string values belonging to the same key
- Once done, iterate over each key-value pair and join everything together for your final result.
from collections import defaultdict
d = defaultdict(list)
for i, j in zip(list_1, list_2):
d[i].append(j)
The defaultdict
makes things simple, and is efficient with appending. If you don't want to use a defaultdict
, use dict.setdefault
instead (but this is a bit more inefficient):
d = {}
for i, j in zip(list_1, list_2):
d.setdefault(i, []).append(j)
new_dict = {k : ','.join(v) for k, v in d.items()})
print(new_dict)
{'4': 'a', '6': 'b', '8': 'c,d'}
Pandas DataFrame.groupby
+ agg
If you want performance at high volumes, try using pandas:
import pandas as pd
df = pd.DataFrame({'A' : list_1, 'B' : list_2})
new_dict = df.groupby('A').B.agg(','.join).to_dict()
print(new_dict)
{'4': 'a', '6': 'b', '8': 'c,d'}
Multiple levels of keys and values in Python
If you just have to "count" things -- and assuming the data file contains all the required level of "hashes" -- that will do the trick:
import collections
result = collections.defaultdict(int)
with open("beast","rt") as f:
for line in f:
hashes = line.split()
key = '-'.join(hashes)
result[key] += 1
print result
Producing the result:defaultdict(<type 'int'>, {'Mammals-whales-Male': 1, 'Birds-Eagle-Female': 2})
If you require nested dictionary -- post-processing of that result is still possible...
Related Topics
How to Normalize a Numpy Array to a Unit Vector
Looping Over All Member Variables of a Class in Python
Split a List into Parts Based on a Set of Indexes in Python
How to Get System Timezone Setting and Pass It to Pytz.Timezone
Wrapping a C Library in Python: C, Cython or Ctypes
How to Remove Gaps Between Subplots in Matplotlib
Find the Closest Date to a Given Date
Creating a Bat File for Python Script
Pipelinedrdd' Object Has No Attribute 'Todf' in Pyspark
Python, Https Get with Basic Authentication
Find Longest Repetitive Sequence in a String
Capture Arbitrary Path in Flask Route
Multiprocessing.Pool: What's the Difference Between Map_Async and Imap
How to Efficiently Calculate a Running Standard Deviation
Parsing HTML in Python - Lxml or Beautifulsoup? Which of These Is Better for What Kinds of Purposes