Find Similar List Value Inside Dictionary

Iterate through list of dictionary and identify similar values in dictionary in Python

orders = [{
'name': 'User_ORDERS1234',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
'users': ['User_2']
},{
'name': 'User_ORDERS1235',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" = \'Shipped\''}],
'users': ['User_1']
},{
'name': 'User_ORDERS1236',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
'users': ['User_3']
}]

for i, order in enumerate(orders): # loop trough orders:
exp1 = order['expressions'] # 'exp' value of the order

for next_order in orders[i+1:]: # loop through the next orders:
exp2 = next_order['expressions'] # 'exp' value of a next order

if exp1 == exp2: # if the 'exp' values are the same:
order['users'] += next_order['users'] # add the 'users' to the order 'users'
next_order['users'] = [] # remove users from the next order

orders = [o for o in orders if o['users']] # leave only the orders that have 'users'

print(orders)

Output

[{
'name': 'User_ORDERS1234',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
'users': ['User_2', 'User_3']
},{
'name': 'User_ORDERS1235',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" = \'Shipped\''}],
'users': ['User_1']
}]

Find duplicate values in list of dictionaries

Some solutions and a benchmark.

Solutions

Fun with a dict, going forwards to get the order and backwards to get the first value.

lst_out = list({d['Second']: d
for s in [1, -1]
for d in lst_in[::s]}.values())

Or using setdefault to keep track of each value's first dict:

tmp = {}
for d in lst_in:
tmp.setdefault(d['Second'], d)
lst_out = list(tmp.values())

Fun and potentially faster version:

add = {}.setdefault
for d in lst_in:
add(d['Second'], d)
lst_out = list(add.__self__.values())

Benchmark

Times for a list of 1000 dicts with 100 different Second values (using Python 3.10.0):

 361 μs   362 μs   364 μs  dict_forward_backward
295 μs 297 μs 297 μs dict_setdefault
231 μs 231 μs 232 μs dict_setdefault_optimized
196 μs 196 μs 197 μs set_in_list_comprehension
190 μs 190 μs 190 μs set_in_list_comprehension_optimized
191 μs 191 μs 191 μs set_in_list_comprehension_optimized_2
201 μs 201 μs 201 μs set_with_loop
1747 μs 1751 μs 1774 μs with_lists

Benchmark code:

from timeit import repeat, default_timer as timer
from random import choices

lst_in = [{'First': i, 'Second': v}
for i, v in enumerate(choices(range(100), k=1000))]

def dict_forward_backward(lst_in):
return list({d['Second']: d
for s in [1, -1]
for d in lst_in[::s]}.values())

def dict_setdefault(lst_in):
tmp = {}
for d in lst_in:
tmp.setdefault(d['Second'], d)
return list(tmp.values())

def dict_setdefault_optimized(lst_in):
add = {}.setdefault
for d in lst_in:
add(d['Second'], d)
return list(add.__self__.values())

def set_in_list_comprehension(lst_in):
return [s.add(v) or d
for s in [set()]
for d in lst_in
for v in [d['Second']]
if v not in s]

def set_in_list_comprehension_optimized(lst_in):
return [add(v) or d
for s in [set()]
for add in [s.add]
for d in lst_in
for v in [d['Second']]
if v not in s]

def set_in_list_comprehension_optimized_2(lst_in):
s = set()
add = s.add
return [add(v) or d
for d in lst_in
for v in [d['Second']]
if v not in s]

def set_with_loop(lst_in):
found = set()
lst_out = []
for dct in lst_in:
if dct['Second'] not in found:
lst_out.append(dct)
found.add( dct['Second'] )
return lst_out

def with_lists(lst_in):
out = {'keep':[], 'counter':[]}
for dct in lst_in:
if dct['Second'] not in out['counter']:
out['keep'].append(dct)
out['counter'].append(dct['Second'])
return out['keep']

funcs = [
dict_forward_backward,
dict_setdefault,
dict_setdefault_optimized,
set_in_list_comprehension,
set_in_list_comprehension_optimized,
set_in_list_comprehension_optimized_2,
set_with_loop,
with_lists,
]

# Correctness
expect = funcs[0](lst_in)
for func in funcs[1:]:
result = func(lst_in)
print(result == expect, func.__name__)
print()

# Speed
for _ in range(3):
for func in funcs:
ts = sorted(repeat(lambda: func(lst_in), 'gc.enable(); gc.collect()', number=1000))[:3]
print(*('%4d μs ' % (t * 1e3) for t in ts), func.__name__)
print()

Filter a dictionary of lists

I solved it with this:

from typing import Dict, List, Any, Set

d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1", "hel", "llo"]}

# First, we create a set that stores the indices which should be kept.
# I chose a set instead of a list because it has a O(1) lookup time.
# We only want to keep the items on indices where the value in d["conf"] is greater than 0
filtered_indexes = {i for i, value in enumerate(d.get('conf', [])) if value > 0}

def filter_dictionary(d: Dict[str, List[Any]], filtered_indexes: Set[int]) -> Dict[str, List[Any]]:
filtered_dictionary = d.copy() # We'll return a modified copy of the original dictionary
for key, list_values in d.items():
# In the next line the actual filtering for each key/value pair takes place.
# The original lists get overwritten with the filtered lists.
filtered_dictionary[key] = [value for i, value in enumerate(list_values) if i in filtered_indexes]
return filtered_dictionary

print(filter_dictionary(d, filtered_indexes))

Output:

{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}

How to get values from a list of dictionaries, which themselves contain lists of dictionaries in Python

First: dict's require a key-value association for every element in the dictionary. Your 2nd level data structure though does not include keys: ({[{'tag': 'tag 1'}]}) This is a set. Unlike dict's, set's do not have keys associated with their elements. So your data structure looks like List[Set[List[Dict[str, str]]]].

Second: when I try to run

# python 3.8.8
player_info = [{[{'tag': 'tag 1'}]},
{[{'tag': 'tag 2'}]}]

I recieve the error TypeError: unhashable type: 'list'. That's because you're code attempts to contain a list inside a set. Set membership in python demands the members to be hashable. However, you will not find a __hash__() function defined on list objects. Even if you resolve this by replacing the list with a tuple, you will find that dict objects are not hashable either. Potential solutions include using immutable objects like frozendict or tuple, but that is another post.

To answer your question, I have reformulated your problem as

player_info = [[[{'tag': 'tag 1'}]],
[[{'tag': 'tag 2'}]]]

and compared the performance difference with A) explicit loops:

for i in range(len(player_info)):
print(player_info[i][0][0]['tag'])

against B) list comprehension

[
print(single_player_info[0][0]['tag'])
for single_player_info in player_info
]

Running the above code blocks in jupyter with the %%timeit cell magic, I got:
A) 154 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) and
B) 120 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Note: This experiment is highly skewed for at least two reasons:

  1. I tested both trials using only the data you provided (N=2). It is very likely that we would observe different scaling behaviors than initial conditions suggest.
  2. print consumes a lot of time and makes this problem heavily subject to the status of the kernel

I hope this answers your question.



Related Topics



Leave a reply



Submit