Merging Several Python Dictionaries

Merge several Python dictionaries

You can iterate over the dictionaries directly -- no need to use range. The setdefault method of dict looks up a key, and returns the value if found. If not found, it returns a default, and also assigns that default to the key.

super_dict = {}
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict.setdefault(k, []).append(v)

Also, you might consider using a defaultdict. This just automates setdefault by calling a function to return a default value when a key isn't found.

import collections
super_dict = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].append(v)

Also, as Sven Marnach astutely observed, you seem to want no duplication of values in your lists. In that case, set gets you what you want:

import collections
super_dict = collections.defaultdict(set)
for d in dicts:
for k, v in d.iteritems(): # d.items() in Python 3+
super_dict[k].add(v)

How to merge dictionaries of dictionaries?

This is actually quite tricky - particularly if you want a useful error message when things are inconsistent, while correctly accepting duplicate but consistent entries (something no other answer here does..)

Assuming you don't have huge numbers of entries, a recursive function is easiest:

def merge(a, b, path=None):
"merges b into a"
if path is None: path = []
for key in b:
if key in a:
if isinstance(a[key], dict) and isinstance(b[key], dict):
merge(a[key], b[key], path + [str(key)])
elif a[key] == b[key]:
pass # same leaf value
else:
raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
else:
a[key] = b[key]
return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

note that this mutates a - the contents of b are added to a (which is also returned). If you want to keep a you could call it like merge(dict(a), b).

agf pointed out (below) that you may have more than two dicts, in which case you can use:

reduce(merge, [dict1, dict2, dict3...])

where everything will be added to dict1.

Note: I edited my initial answer to mutate the first argument; that makes the "reduce" easier to explain

PS: In python 3, you will also need from functools import reduce

How do I merge two dictionaries in a single expression?

How can I merge two Python dictionaries in a single expression?

For dictionaries x and y, their shallowly-merged dictionary z takes values from y, replacing those from x.

  • In Python 3.9.0 or greater (released 17 October 2020, PEP-584, discussed here):

    z = x | y
  • In Python 3.5 or greater:

    z = {**x, **y}
  • In Python 2, (or 3.4 or lower) write a function:

    def merge_two_dicts(x, y):
    z = x.copy() # start with keys and values of x
    z.update(y) # modifies z with keys and values of y
    return z

    and now:

    z = merge_two_dicts(x, y)

Explanation

Say you have two dictionaries and you want to merge them into a new dictionary without altering the original dictionaries:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}

The desired result is to get a new dictionary (z) with the values merged, and the second dictionary's values overwriting those from the first.

>>> z
{'a': 1, 'b': 3, 'c': 4}

A new syntax for this, proposed in PEP 448 and available as of Python 3.5, is

z = {**x, **y}

And it is indeed a single expression.

Note that we can merge in with literal notation as well:

z = {**x, 'foo': 1, 'bar': 2, **y}

and now:

>>> z
{'a': 1, 'b': 3, 'foo': 1, 'bar': 2, 'c': 4}

It is now showing as implemented in the release schedule for 3.5, PEP 478, and it has now made its way into the What's New in Python 3.5 document.

However, since many organizations are still on Python 2, you may wish to do this in a backward-compatible way. The classically Pythonic way, available in Python 2 and Python 3.0-3.4, is to do this as a two-step process:

z = x.copy()
z.update(y) # which returns None since it mutates z

In both approaches, y will come second and its values will replace x's values, thus b will point to 3 in our final result.

Not yet on Python 3.5, but want a single expression

If you are not yet on Python 3.5 or need to write backward-compatible code, and you want this in a single expression, the most performant while the correct approach is to put it in a function:

def merge_two_dicts(x, y):
"""Given two dictionaries, merge them into a new dict as a shallow copy."""
z = x.copy()
z.update(y)
return z

and then you have a single expression:

z = merge_two_dicts(x, y)

You can also make a function to merge an arbitrary number of dictionaries, from zero to a very large number:

def merge_dicts(*dict_args):
"""
Given any number of dictionaries, shallow copy and merge into a new dict,
precedence goes to key-value pairs in latter dictionaries.
"""
result = {}
for dictionary in dict_args:
result.update(dictionary)
return result

This function will work in Python 2 and 3 for all dictionaries. e.g. given dictionaries a to g:

z = merge_dicts(a, b, c, d, e, f, g) 

and key-value pairs in g will take precedence over dictionaries a to f, and so on.

Critiques of Other Answers

Don't use what you see in the formerly accepted answer:

z = dict(x.items() + y.items())

In Python 2, you create two lists in memory for each dict, create a third list in memory with length equal to the length of the first two put together, and then discard all three lists to create the dict. In Python 3, this will fail because you're adding two dict_items objects together, not two lists -

>>> c = dict(a.items() + b.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict_items' and 'dict_items'

and you would have to explicitly create them as lists, e.g. z = dict(list(x.items()) + list(y.items())). This is a waste of resources and computation power.

Similarly, taking the union of items() in Python 3 (viewitems() in Python 2.7) will also fail when values are unhashable objects (like lists, for example). Even if your values are hashable, since sets are semantically unordered, the behavior is undefined in regards to precedence. So don't do this:

>>> c = dict(a.items() | b.items())

This example demonstrates what happens when values are unhashable:

>>> x = {'a': []}
>>> y = {'b': []}
>>> dict(x.items() | y.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Here's an example where y should have precedence, but instead the value from x is retained due to the arbitrary order of sets:

>>> x = {'a': 2}
>>> y = {'a': 1}
>>> dict(x.items() | y.items())
{'a': 2}

Another hack you should not use:

z = dict(x, **y)

This uses the dict constructor and is very fast and memory-efficient (even slightly more so than our two-step process) but unless you know precisely what is happening here (that is, the second dict is being passed as keyword arguments to the dict constructor), it's difficult to read, it's not the intended usage, and so it is not Pythonic.

Here's an example of the usage being remediated in django.

Dictionaries are intended to take hashable keys (e.g. frozensets or tuples), but this method fails in Python 3 when keys are not strings.

>>> c = dict(a, **b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

From the mailing list, Guido van Rossum, the creator of the language, wrote:

I am fine with
declaring dict({}, **{1:3}) illegal, since after all it is abuse of
the ** mechanism.

and

Apparently dict(x, **y) is going around as "cool hack" for "call
x.update(y) and return x". Personally, I find it more despicable than
cool.

It is my understanding (as well as the understanding of the creator of the language) that the intended usage for dict(**y) is for creating dictionaries for readability purposes, e.g.:

dict(a=1, b=10, c=11)

instead of

{'a': 1, 'b': 10, 'c': 11}

Response to comments

Despite what Guido says, dict(x, **y) is in line with the dict specification, which btw. works for both Python 2 and 3. The fact that this only works for string keys is a direct consequence of how keyword parameters work and not a short-coming of dict. Nor is using the ** operator in this place an abuse of the mechanism, in fact, ** was designed precisely to pass dictionaries as keywords.

Again, it doesn't work for 3 when keys are not strings. The implicit calling contract is that namespaces take ordinary dictionaries, while users must only pass keyword arguments that are strings. All other callables enforced it. dict broke this consistency in Python 2:

>>> foo(**{('a', 'b'): None})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: foo() keywords must be strings
>>> dict(**{('a', 'b'): None})
{('a', 'b'): None}

This inconsistency was bad given other implementations of Python (PyPy, Jython, IronPython). Thus it was fixed in Python 3, as this usage could be a breaking change.

I submit to you that it is malicious incompetence to intentionally write code that only works in one version of a language or that only works given certain arbitrary constraints.

More comments:

dict(x.items() + y.items()) is still the most readable solution for Python 2. Readability counts.

My response: merge_two_dicts(x, y) actually seems much clearer to me, if we're actually concerned about readability. And it is not forward compatible, as Python 2 is increasingly deprecated.

{**x, **y} does not seem to handle nested dictionaries. the contents of nested keys are simply overwritten, not merged [...] I ended up being burnt by these answers that do not merge recursively and I was surprised no one mentioned it. In my interpretation of the word "merging" these answers describe "updating one dict with another", and not merging.

Yes. I must refer you back to the question, which is asking for a shallow merge of two dictionaries, with the first's values being overwritten by the second's - in a single expression.

Assuming two dictionaries of dictionaries, one might recursively merge them in a single function, but you should be careful not to modify the dictionaries from either source, and the surest way to avoid that is to make a copy when assigning values. As keys must be hashable and are usually therefore immutable, it is pointless to copy them:

from copy import deepcopy

def dict_of_dicts_merge(x, y):
z = {}
overlapping_keys = x.keys() & y.keys()
for key in overlapping_keys:
z[key] = dict_of_dicts_merge(x[key], y[key])
for key in x.keys() - overlapping_keys:
z[key] = deepcopy(x[key])
for key in y.keys() - overlapping_keys:
z[key] = deepcopy(y[key])
return z

Usage:

>>> x = {'a':{1:{}}, 'b': {2:{}}}
>>> y = {'b':{10:{}}, 'c': {11:{}}}
>>> dict_of_dicts_merge(x, y)
{'b': {2: {}, 10: {}}, 'a': {1: {}}, 'c': {11: {}}}

Coming up with contingencies for other value types is far beyond the scope of this question, so I will point you at my answer to the canonical question on a "Dictionaries of dictionaries merge".

Less Performant But Correct Ad-hocs

These approaches are less performant, but they will provide correct behavior.
They will be much less performant than copy and update or the new unpacking because they iterate through each key-value pair at a higher level of abstraction, but they do respect the order of precedence (latter dictionaries have precedence)

You can also chain the dictionaries manually inside a dict comprehension:

{k: v for d in dicts for k, v in d.items()} # iteritems in Python 2.7

or in Python 2.6 (and perhaps as early as 2.4 when generator expressions were introduced):

dict((k, v) for d in dicts for k, v in d.items()) # iteritems in Python 2

itertools.chain will chain the iterators over the key-value pairs in the correct order:

from itertools import chain
z = dict(chain(x.items(), y.items())) # iteritems in Python 2

Performance Analysis

I'm only going to do the performance analysis of the usages known to behave correctly. (Self-contained so you can copy and paste yourself.)

from timeit import repeat
from itertools import chain

x = dict.fromkeys('abcdefg')
y = dict.fromkeys('efghijk')

def merge_two_dicts(x, y):
z = x.copy()
z.update(y)
return z

min(repeat(lambda: {**x, **y}))
min(repeat(lambda: merge_two_dicts(x, y)))
min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
min(repeat(lambda: dict(chain(x.items(), y.items()))))
min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))

In Python 3.8.1, NixOS:

>>> min(repeat(lambda: {**x, **y}))
1.0804965235292912
>>> min(repeat(lambda: merge_two_dicts(x, y)))
1.636518670246005
>>> min(repeat(lambda: {k: v for d in (x, y) for k, v in d.items()}))
3.1779992282390594
>>> min(repeat(lambda: dict(chain(x.items(), y.items()))))
2.740647904574871
>>> min(repeat(lambda: dict(item for d in (x, y) for item in d.items())))
4.266070580109954
$ uname -a
Linux nixos 4.19.113 #1-NixOS SMP Wed Mar 25 07:06:15 UTC 2020 x86_64 GNU/Linux

Resources on Dictionaries

  • My explanation of Python's dictionary implementation, updated for 3.6.
  • Answer on how to add new keys to a dictionary
  • Mapping two lists into a dictionary
  • The official Python docs on dictionaries
  • The Dictionary Even Mightier - talk by Brandon Rhodes at Pycon 2017
  • Modern Python Dictionaries, A Confluence of Great Ideas - talk by Raymond Hettinger at Pycon 2017

How to merge dicts, collecting values from matching keys?

assuming all keys are always present in all dicts:

ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)

Note: In Python 3.x use below code:

ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)

and if the dic contain numpy arrays:

ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))

How to merge a list of multiple dictionaries into a dictionary of lists?

defaultdict

You can use collections.defaultdict. Your dictionary comprehension will never work as you are not defining any lists. This is likely to be more efficient than using a dictionary comprehension, which would involve iterating each dictionary for each unique key.

from collections import defaultdict

dd = defaultdict(list)

for d in list_of_dictionaries:
for k, v in d.items():
dd[k].append(v)

Result:

print(dd)

defaultdict(list,
{0: [3523, 7245],
1: [3524, 7246, 20898],
2: [3540, 7247, 20899],
4: [3541, 20901],
5: [3542, 7249, 20902],
3: [7248, 20900],
6: [7250]})

Dictionary comprehension

A dictionary comprehension is possible but this requires calculating the union of keys and iterating the list of dictionaries for each of these keys:

allkeys = set().union(*list_of_dictionaries)

res = {k: [d[k] for d in list_of_dictionaries if k in d] for k in allkeys}

{0: [3523, 7245],
1: [3524, 7246, 20898],
2: [3540, 7247, 20899],
3: [7248, 20900],
4: [3541, 20901],
5: [3542, 7249, 20902],
6: [7250]}

Time complexity

Consider these terms:

n = sum(map(len, list_of_dictionaries))
m = len(set().union(*list_of_dictionaries))
k = len(list_of_dictionaries)

In this context, the defaultdict solution will have complexity O(n), while the dictionary comprehension will have complexity O(mk), where mk >= n.

Merging multiple dictionaries in python

dico_list=[{"item1": {"item2": "300"}}, {"item1": {"item3": {"item4": "400"}}}, {"item1": {"item3": {"item6": "16"}}}, {"item1": {"item7": "aaa"}}, {"item1": {"item8": "bbb"}}, {"item1": {"item9": {"item10" : "2.2"}}}, {"item1": {"item9": {"item11" : "xxx"}}}]

def merge(merge_dico,dico_list):
for dico in dico_list:
for key,value in dico.items():
if type(value)==type(dict()):
merge_dico.setdefault(key,dict())
merge(merge_dico[key],[value])
else:
merge_dico[key]=value
return merge_dico

print(merge(dict(),dico_list))
#{'item1': {'item7': 'aaa', 'item9': {'item11': 'xxx', 'item10': '2.2'}, 'item8': 'bbb', 'item3': {'item4': '400', 'item6': '16'}, 'item2': '300'}}

Merging multiple dictionaries with inconsistent keys

There are probably solutions using base python, but simplest way I can think of is to use the pandas library to convert each list to a DataFrame, then join/merge them together.

import pandas as pd

dfA = pd.DataFrame(listA)
dfB = pd.DataFrame(listB)

merged_df = dfA.merge(dfB, left_on='uid', right_on='number')

That would return a DataFrame with more columns than you need (i.e. there would be columns for both "uid" and "number"), but you could specify which ones you want and the order you want them this way:

merged_df = merged_df[['uid', 'name', 'val1']]

For merging multiple DataFrames into one master frame, see here: pandas three-way joining multiple dataframes on columns

How do I merge a list of dicts into a single dict?

This works for dictionaries of any length:

>>> result = {}
>>> for d in L:
... result.update(d)
...
>>> result
{'a':1,'c':1,'b':2,'d':2}

As a comprehension:

# Python >= 2.7
{k: v for d in L for k, v in d.items()}

# Python < 2.7
dict(pair for d in L for pair in d.items())

Merge several dictionaries creating array on different values

Use a collections.defaultdict to group the c values by a and b tuple keys:

from collections import defaultdict

lst = [
{"a": 1, "b": 2, "c": 3},
{"a": 1, "b": 2, "c": 4},
{"a": 1, "b": 3, "c": 3},
{"a": 1, "b": 3, "c": 4},
]

d = defaultdict(list)
for x in lst:
d[x["a"], x["b"]].append(x["c"])

result = [{"a": a, "b": b, "c": c} for (a, b), c in d.items()]

print(result)

Could also use itertools.groupby if lst is already ordered by a and b:

from itertools import groupby
from operator import itemgetter

lst = [
{"a": 1, "b": 2, "c": 3},
{"a": 1, "b": 2, "c": 4},
{"a": 1, "b": 3, "c": 3},
{"a": 1, "b": 3, "c": 4},
]

result = [
{"a": a, "b": b, "c": [x["c"] for x in g]}
for (a, b), g in groupby(lst, key=itemgetter("a", "b"))
]

print(result)

Or if lst is not ordered by a and b, we can sort by those two keys as well:

result = [
{"a": a, "b": b, "c": [x["c"] for x in g]}
for (a, b), g in groupby(
sorted(lst, key=itemgetter("a", "b")), key=itemgetter("a", "b")
)
]

print(result)

Output:

[{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]

Update

For a more generic solution for any amount of keys:

def merge_lst_dicts(lst, keys, merge_key):
groups = defaultdict(list)

for item in lst:
key = tuple(item.get(k) for k in keys)
groups[key].append(item.get(merge_key))

return [
{**dict(zip(keys, group_key)), **{merge_key: merged_values}}
for group_key, merged_values in groups.items()
]

print(merge_lst_dicts(lst, ["a", "b"], "c"))
# [{'a': 1, 'b': 2, 'c': [3, 4]}, {'a': 1, 'b': 3, 'c': [3, 4]}]

Merging multiple dictionaries that have dictionaries in list

One approach

from collections import defaultdict
from operator import itemgetter

# creat a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})
for d in dictA["stdout"]:
key = (d['foo'], d['bar'], d['host'])
groups[key].append(d)

# use item getter for better readability
count = itemgetter("count")

# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]

# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)

Output

{'stderr': '',
'stdout': [{'bar': 'B', 'count': 415, 'foo': 'A', 'host': None},
{'bar': 'B', 'count': 46, 'foo': 'A', 'host': 'orange'},
{'bar': 'B', 'count': 28, 'foo': 'C', 'host': 'egg'},
{'bar': 'E', 'count': 4, 'foo': 'D', 'host': 'apple'},
{'bar': 'E', 'count': 3, 'foo': 'A', 'host': 'pineapple'},
{'bar': 'F', 'count': 2, 'foo': 'C', 'host': 'carrot'},
{'bar': 'E', 'count': 1, 'foo': 'A', 'host': 'chicken breast'}]}

For more on each of the functions and data structures used in the code above see: sorted, defaultdict and itemgetter

One alternative

Use groupby:

import pprint
from operator import itemgetter
from itertools import groupby

def key(d):
return d["foo"], d["bar"], d["host"] or ""

count = itemgetter("count")
lst = sorted(dictA["stdout"] + dictB["stdout"], key=key)
groups = groupby(lst, key=key)
ds = [{'foo': f, 'bar': b, 'host': h or None, 'count': sum(count(d) for d in vs)} for (f, b, h), vs in groups]
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)

This second approach has two caveats:

  1. The time complexity is O(nlogn) the first one was O(n)
  2. In order to sort the list of dictionaries it needs to replace None by the empty string "".

Multiple dictionaries

If you have multiple dictionaries you can change the first approach to:

# create a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})

# create a list with all the dictionaries from multiple dict
data = []
lst = [dictA] # change this line to contain all the dictionaries except B
for d in lst:
data.extend(d["stdout"])

for d in data:
key = (d['foo'], d['bar'], d['host'])
groups[key].append(d)

# use item getter for better readability
count = itemgetter("count")

# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]

# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}

What is itemgetter?

From the documentation:

Return a callable object that fetches item from its operand using the
operand’s getitem() method. If multiple items are specified,
returns a tuple of lookup values.

Is equivalent to:

def itemgetter(*items):
if len(items) == 1:
item = items[0]
def g(obj):
return obj[item]
else:
def g(obj):
return tuple(obj[item] for item in items)
return g


Related Topics



Leave a reply



Submit