What Is the Most Efficient Way to Sum a Dict With Multiple Keys by One Key

What is the most efficient way to sum a dict with multiple keys by one key?

Using basic Python, this doesn't get a whole lot better. You could hack something together with itertools.groupby, but it'd be ugly and probably slower, certainly less clear.

As @9769953 suggested, though, Pandas is a good package to handle this sort of structured, tabular data.

In [1]: import pandas as pd
In [2]: df = pd.DataFrame(lst)
Out[2]:
Name price qty
0 A 10 100
1 A 10 100
2 A 10 100
3 B 10 100
4 C 10 100
5 C 10 100
In [3]: df.groupby('Name').agg(sum)
Out[3]:
price qty
Name
A 30 300
B 10 100
C 20 200

You just need a little extra mojo if you don't want to keep the data as a dataframe:

In [4]: grouped = df.groupby('Name', as_index=False).agg(sum)
In [5]: list(grouped.T.to_dict().values())
Out[5]:
[{'Name': 'A', 'price': 30, 'qty': 300},
{'Name': 'B', 'price': 10, 'qty': 100},
{'Name': 'C', 'price': 20, 'qty': 200}]

Fastest way to sum up in a dictionary with specified (and ordered) keys

Here is another solution (assuming all the keys exist):

sum(map(dict_1.get, range(3, 5+1)))

This solution is generally faster than the others:

For the example provided in the question:
- this solution: 270 ns
- bb1: 446 ns
- Gwang-Jin Kim: 356 ns
- MrGre4a: 244 ns

Dict of 1,000 elements and 20 values fetched:
- this solution: 526 ns
- bb1: 36200 ns (slow because it walks through the whole dict)
- Gwang-Jin Kim: 965 ns
- MrGre4a: 879 ns

How to sum values in multidimensional dictionary?

To sum values for a single subkey you could use sum() with a generator expression:

>>> d = {'A': {'val1': 3,'val2': 5}, 'B': {'val1': 2, 'val2': 6}}
>>> sum(x['val1'] for x in d.values())
5

To sum values for all subkeys you can use collections.Counter:

>>> from collections import Counter
>>> counter = sum(map(Counter, d.values()), Counter())
>>> dict(counter)
{'val2': 11, 'val1': 5}

What is an efficient way to find the minimum sum of multiple dictionary values, given keys of mutually exclusive integers?

I tried solving this problem three different ways- an optimized brute force, a dynamic programming approach, and a greedy algorithm. The first two could not handle inputs for n > 17, but generated optimal solutions, so I could use them to verify the average performance of the greedy method. I'll start first with the dynamic programming approach, and then describe the greedy one.

Dynamic Programming

First, note that we can if we determine that (1, 2, 3, 4) and (5, 6, 7, 8) sum to a smaller value than (3, 4, 5, 6) and (1, 2, 7, 8), then your optimal solution absolutely cannot contain both (3, 4, 5, 6) and (1, 2, 7, 8)- because you could swap them out for the former, and have a smaller sum. Extending this logic, there will be one optimal combination of (a, b, c, d) and (e, f, g, h) that results in the minimal sum from all combinations of x0, x1, x2, x3, x4, x5, x6, x7, and so we can rule out all of the others.

Using this knowledge, we can be a dictionary mapping of all x0, x1, x2, x3, x4, x5, x6, x7 combinations from the set [0, n), to their minimal sums, by brute forcing the sums of all combinations of x0, x1, x2, x3. Then, we can use these mappings to repeat the process for x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11 from x0, x1, x2, x3, x4, x5, x6, x7 and x0, x1, x2, x3 pairs. We repeat this process until we obtain all minimal sums for x0, x1 ... x_(4*m-1), which we then iterate over to find the minimal sum.

def dp_solve(const_dict, n, m):

lookup = {comb:(comb,) for comb in const_dict.keys()}

keys = set(range(n))
for size in range(8, 4 * m + 1, 4):
for key_total in combinations(keys, size):
key_set = set(key_total)
min_keys = (key_total[:4], key_total[4:])
min_val = const_dict[min_keys[0]] + const_dict[min_keys[1]]

key1, key2 = min(zip(combinations(key_total, 4), reversed(list(combinations(key_total, size - 4)))), key=lambda x:const_dict[x[0]]+const_dict[x[1]])

k = tuple(sorted(x for x in key1 + key2))
const_dict[k] = const_dict[key1] + const_dict[key2]
lookup[k] = lookup[key1] + lookup[key2]

key, val = min(((key, val) for key, val in const_dict.items() if len(key) == 4 * m), key=lambda x: x[1])
return lookup[key], val

Admittedly this implementation is pretty gnarly, because I kept micro-optimizing piece after piece hoping to make it fast enough without having to switch to a greedy approach.

Greedy

This is probably the one you care about, since it handles fairly large inputs quickly, and is quite accurate.

Start by constructing a list for partial sums, and begin iterating over your elements in your dictionary by increasing value. For each element, find all partial sums that don't create any collisions with their keys and "combine" them into a new partial sum, and append to the list. In doing so, you build a list of minimal partial sums that can be created from the smallest k values in your dictionary. To speed this all up, I use hash sets to quickly check which partial sums contain pairs of the same key.

In the "fast" greedy approach, you would abort the moment you find a partial sum that has a key length of 4 * m (or equivalently, of m 4-tuples). This usually nets fairly good results in my experience, but I wanted to add some logic to make it more accurate if need be. To do so, I add two factors-

  • extra_runs - which dictates how many extra iterations to search for better solutions before breaking
  • check_factor - which specifies a multiple of current search "depth" to scan forward for a single new integer that creates a better solution with the current state. This is different from the above in that it does not "preserve" each new integer checked- it only does a quick sum to see if it creates a new min. This makes it significantly faster, at the cost that the other m - 1 4-tuples must already exist in one of the partial sums.

Combined, these checks seem to always find the true minimal sum, at the cost of about 5x longer runtime (still quite fast though). To disable them, just pass 0 for both factors.

def greedy_solve(const_dict, n, m, extra_runs=10, check_factor=2):
pairs = sorted(const_dict.items(), key=lambda x: x[1])

lookup = [set([]) for _ in range(n)]
nset = set([])

min_sums = []
min_key, min_val = None, None
for i, (pkey, pval) in enumerate(pairs):
valid = set(nset)
for x in pkey:
valid -= lookup[x]
lookup[x].add(len(min_sums))

nset.add(len(min_sums))
min_sums.append(((pkey,), pval))

for x in pkey:
lookup[x].update(range(len(min_sums), len(min_sums) + len(valid)))
for idx in valid:
comb, val = min_sums[idx]
for key in comb:
for x in key:
lookup[x].add(len(min_sums))
nset.add(len(min_sums))
min_sums.append((comb + (pkey,), val + pval))
if len(comb) == m - 1 and (not min_key or min_val > val + pval):
min_key, min_val = min_sums[-1]

if min_key:
if not extra_runs: break
extra_runs -= 1

for pkey, pval in pairs[:int(check_factor*i)]:
valid = set(nset)
for x in pkey:
valid -= lookup[x]

for idx in valid:
comb, val = min_sums[idx]
if len(comb) < m - 1:
nset.remove(idx)
elif min_val > val + pval:
min_key, min_val = comb + (pkey,), val + pval
return min_key, min_val

I tested this for n < 36 and m < 9, and it seemed to run fairly fast (couple of seconds to complete at worst). I'd imagine it should work for your case 12 <= n <= 24 pretty quickly.

Summing dictionary of lists values by key

Iterate over the keys and sum each of the sublists in a list comprehension:

d = {0: [[1.0, 2.0, 3.0], [7.0, 8.0, 9.0]], 1: [[4.0, 5.0, 6.0], [10.0, 11.0, 12.0]]}

for k in d:
d[k] = [sum(l) for l in d[k]]

>>> d
{0: [6.0, 24.0], 1: [15.0, 33.0]}

Also works on a defaultdict.

A new dictionary can be obtained without modifying the original by using a dictionary comprehension:

new_d = {k:[sum(l) for l in d[k]] for k in d}

Simultaneously sum two keys in list of dicts by multiple items

You can use collections.defaultdict for an O(n) solution. As opposed to itertools.groupby, this does not require sorting beforehand.

The idea is to group by pre-defined group_keys. Then use a list comprehension to combine keys and values of your defaultdict. The syntax {**d1, **d2} is used to combine two dictionaries.

from collections import defaultdict
from operator import itemgetter

d = defaultdict(lambda: defaultdict(int))

group_keys = ['id', 'var1', 'var2']
sum_keys = ['var3', 'var4']

for item in x:
for key in sum_keys:
d[itemgetter(*group_keys)(item)][key] += item[key]

res = [{**dict(zip(group_keys, k)), **v} for k, v in d.items()]

print(res)

[{'id': 1, 'var1': 'a', 'var2': 'left', 'var3': 0.1, 'var4': 1},
{'id': 2, 'var1': 'a', 'var2': 'right', 'var3': 0.3, 'var4': 4},
{'id': 4, 'var1': 'b', 'var2': 'left', 'var3': 0.4, 'var4': 4},
{'id': 5, 'var1': 'b', 'var2': 'right', 'var3': 0.5, 'var4': 7}]

Sum values of dictionary if any character within key strings match

You can use a union-find data structure to solve this problem. The networkx package provides an implementation, but there's nothing stopping you from writing your own.

In essence, we maintain a collection of disjoint sets. Initially, every string belongs to its own disjoint set. For each pair of strings, if they have letters in common, we union the disjoint sets they belong to. This eventually gives us the groups that we're looking for.

From here, we use the .to_sets() method to get the groupings, and compute the desired sum:

from networkx.utils.union_find import UnionFind

data = # dictionary in question, omitted for brevity
keys = list(data.keys())

uf = UnionFind(data.keys())

for outer_idx in range(len(keys)):
for inner_idx in range(outer_idx + 1, len(keys)):
if set(keys[outer_idx]) & set(keys[inner_idx]):
uf.union(keys[outer_idx], keys[inner_idx])

result = {}
for idx, group in enumerate(uf.to_sets()):
result[idx] = sum(data[key] for key in group)

print(result)

This outputs:

{0: 12, 1: 13}

python : Get the sum of values of a key in list of dictionaries

Use Counter

>>> from collections import Counter
>>> c=Counter()
>>> for d in l:
... c.update(d)
...
>>> dict(c)
{'a': 30, 'b': 15}

Merge and sum of two dictionaries

You didn't say how exactly you want to merge, so take your pick:

x = {'both1': 1, 'both2': 2, 'only_x': 100}
y = {'both1': 10, 'both2': 20, 'only_y': 200}

print {k: x.get(k, 0) + y.get(k, 0) for k in set(x)}
print {k: x.get(k, 0) + y.get(k, 0) for k in set(x) & set(y)}
print {k: x.get(k, 0) + y.get(k, 0) for k in set(x) | set(y)}

Results:

{'both2': 22, 'only_x': 100, 'both1': 11}
{'both2': 22, 'both1': 11}
{'only_y': 200, 'both2': 22, 'both1': 11, 'only_x': 100}

Most efficient way to join two dictionaries and group them by different key and sum up values

You can use itertools.groupby + functools.reduce + collections.Counter + operator.add:

  1. Import the necessary libraries:
from functools import reduce
from collections import Counter
import operator as op
import itertools as it

  1. We are going to change the structure of results list, having as key the _id
results = [{r['_id']: {'countryCode': value} for value in r.values()} for r in results]

  1. Then, we are going to update changes dictionary with the results list.
for index, key in enumerate(changes.keys()):
changes[key].update(results[index][key])

  1. Finally, we are going to use itertools.groupby in order to group our data based on the countryCode key. aggregations is a list of Counters, for example: [Counter({'ADDED': 3, 'MODIFIED': 3, 'REMOVED': 1, 'countryCode': 'DE'}), Counter(...)]. We are going to use reduce in order to sum each Counter object in the list above.
output = dict()
for g, iter in it.groupby(changes.values(), lambda d: d['countryCode']):
aggregations = [Counter(i) for i in iter]
for agg in aggregations:
del agg['countryCode']
aggregations = reduce(op.add, aggregations)
output[g] = aggregations if g not in output.keys() else reduce(op.add, [output[g], aggregations])

# If you don't make this, then you'll get {'CN': Counter({...}), ...}
output = {key: dict(value) for key, value in output.items()}

Output:

{'CN': {'ADDED': 3, 'MODIFIED': 3, 'REMOVED': 55},
'DE': {'ADDED': 44, 'MODIFIED': 52, 'REMOVED': 11},
'SG': {'ADDED': 37, 'MODIFIED': 27, 'REMOVED': 58}}


Related Topics



Leave a reply



Submit