Iterate through list of dictionary and identify similar values in dictionary in Python
orders = [{
'name': 'User_ORDERS1234',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
'users': ['User_2']
},{
'name': 'User_ORDERS1235',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" = \'Shipped\''}],
'users': ['User_1']
},{
'name': 'User_ORDERS1236',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
'users': ['User_3']
}]
for i, order in enumerate(orders): # loop trough orders:
exp1 = order['expressions'] # 'exp' value of the order
for next_order in orders[i+1:]: # loop through the next orders:
exp2 = next_order['expressions'] # 'exp' value of a next order
if exp1 == exp2: # if the 'exp' values are the same:
order['users'] += next_order['users'] # add the 'users' to the order 'users'
next_order['users'] = [] # remove users from the next order
orders = [o for o in orders if o['users']] # leave only the orders that have 'users'
print(orders)
Output
[{
'name': 'User_ORDERS1234',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
'users': ['User_2', 'User_3']
},{
'name': 'User_ORDERS1235',
'expressions': [{'exp': '"table"."ORDERS"."STATUS" = \'Shipped\''}],
'users': ['User_1']
}]
Find duplicate values in list of dictionaries
Some solutions and a benchmark.
Solutions
Fun with a dict, going forwards to get the order and backwards to get the first value.
lst_out = list({d['Second']: d
for s in [1, -1]
for d in lst_in[::s]}.values())
Or using setdefault
to keep track of each value's first dict:
tmp = {}
for d in lst_in:
tmp.setdefault(d['Second'], d)
lst_out = list(tmp.values())
Fun and potentially faster version:
add = {}.setdefault
for d in lst_in:
add(d['Second'], d)
lst_out = list(add.__self__.values())
Benchmark
Times for a list of 1000 dicts with 100 different Second
values (using Python 3.10.0):
361 μs 362 μs 364 μs dict_forward_backward
295 μs 297 μs 297 μs dict_setdefault
231 μs 231 μs 232 μs dict_setdefault_optimized
196 μs 196 μs 197 μs set_in_list_comprehension
190 μs 190 μs 190 μs set_in_list_comprehension_optimized
191 μs 191 μs 191 μs set_in_list_comprehension_optimized_2
201 μs 201 μs 201 μs set_with_loop
1747 μs 1751 μs 1774 μs with_lists
Benchmark code:
from timeit import repeat, default_timer as timer
from random import choices
lst_in = [{'First': i, 'Second': v}
for i, v in enumerate(choices(range(100), k=1000))]
def dict_forward_backward(lst_in):
return list({d['Second']: d
for s in [1, -1]
for d in lst_in[::s]}.values())
def dict_setdefault(lst_in):
tmp = {}
for d in lst_in:
tmp.setdefault(d['Second'], d)
return list(tmp.values())
def dict_setdefault_optimized(lst_in):
add = {}.setdefault
for d in lst_in:
add(d['Second'], d)
return list(add.__self__.values())
def set_in_list_comprehension(lst_in):
return [s.add(v) or d
for s in [set()]
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_in_list_comprehension_optimized(lst_in):
return [add(v) or d
for s in [set()]
for add in [s.add]
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_in_list_comprehension_optimized_2(lst_in):
s = set()
add = s.add
return [add(v) or d
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_with_loop(lst_in):
found = set()
lst_out = []
for dct in lst_in:
if dct['Second'] not in found:
lst_out.append(dct)
found.add( dct['Second'] )
return lst_out
def with_lists(lst_in):
out = {'keep':[], 'counter':[]}
for dct in lst_in:
if dct['Second'] not in out['counter']:
out['keep'].append(dct)
out['counter'].append(dct['Second'])
return out['keep']
funcs = [
dict_forward_backward,
dict_setdefault,
dict_setdefault_optimized,
set_in_list_comprehension,
set_in_list_comprehension_optimized,
set_in_list_comprehension_optimized_2,
set_with_loop,
with_lists,
]
# Correctness
expect = funcs[0](lst_in)
for func in funcs[1:]:
result = func(lst_in)
print(result == expect, func.__name__)
print()
# Speed
for _ in range(3):
for func in funcs:
ts = sorted(repeat(lambda: func(lst_in), 'gc.enable(); gc.collect()', number=1000))[:3]
print(*('%4d μs ' % (t * 1e3) for t in ts), func.__name__)
print()
Filter a dictionary of lists
I solved it with this:
from typing import Dict, List, Any, Set
d = {"level":[1,2,3], "conf":[-1,1,2], "text":["-1", "hel", "llo"]}
# First, we create a set that stores the indices which should be kept.
# I chose a set instead of a list because it has a O(1) lookup time.
# We only want to keep the items on indices where the value in d["conf"] is greater than 0
filtered_indexes = {i for i, value in enumerate(d.get('conf', [])) if value > 0}
def filter_dictionary(d: Dict[str, List[Any]], filtered_indexes: Set[int]) -> Dict[str, List[Any]]:
filtered_dictionary = d.copy() # We'll return a modified copy of the original dictionary
for key, list_values in d.items():
# In the next line the actual filtering for each key/value pair takes place.
# The original lists get overwritten with the filtered lists.
filtered_dictionary[key] = [value for i, value in enumerate(list_values) if i in filtered_indexes]
return filtered_dictionary
print(filter_dictionary(d, filtered_indexes))
Output:
{'level': [2, 3], 'conf': [1, 2], 'text': ['hel', 'llo']}
How to get values from a list of dictionaries, which themselves contain lists of dictionaries in Python
First: dict
's require a key-value association for every element in the dictionary. Your 2nd level data structure though does not include keys: ({[{'tag': 'tag 1'}]}
) This is a set
. Unlike dict
's, set
's do not have keys associated with their elements. So your data structure looks like List[Set[List[Dict[str, str]]]]
.
Second: when I try to run
# python 3.8.8
player_info = [{[{'tag': 'tag 1'}]},
{[{'tag': 'tag 2'}]}]
I recieve the error TypeError: unhashable type: 'list'
. That's because you're code attempts to contain a list inside a set. Set membership in python demands the members to be hashable. However, you will not find a __hash__()
function defined on list
objects. Even if you resolve this by replacing the list
with a tuple
, you will find that dict
objects are not hashable either. Potential solutions include using immutable objects like frozendict
or tuple
, but that is another post.
To answer your question, I have reformulated your problem as
player_info = [[[{'tag': 'tag 1'}]],
[[{'tag': 'tag 2'}]]]
and compared the performance difference with A) explicit loops:
for i in range(len(player_info)):
print(player_info[i][0][0]['tag'])
against B) list comprehension
[
print(single_player_info[0][0]['tag'])
for single_player_info in player_info
]
Running the above code blocks in jupyter with the %%timeit
cell magic, I got:
A) 154 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
and
B) 120 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Note: This experiment is highly skewed for at least two reasons:
- I tested both trials using only the data you provided (N=2). It is very likely that we would observe different scaling behaviors than initial conditions suggest.
print
consumes a lot of time and makes this problem heavily subject to the status of the kernel
I hope this answers your question.
Related Topics
Check If Values of Multiple Columns Are the Same (Python)
How to Merge Elements in List in Python With Condition
How to Install Pypdf2 Module Using Windows
How to Write Python Array (Data = []) to Excel
How to Cleanly Uninstall Ansible
Turn String into a List and Remove Carriage Returns (Python)
Python: How to Keep Repeating a Program Until a Specific Input Is Obtained
How to Locate the Index With in a Nested List Python
How to Create a for Loop That Goes Through All Diagonal Possibilities of a List
Python Overflowerror: Int Too Large to Convert to Float
Sum Numbers of Each Row of a Matrix Python
No Matching Distribution Found for Tkinter
Import Error: Dll Load Failed in Jupyter Notebook But Working in .Py File
Xlsxwriter: How to Open an Existing Worksheet in My Workbook
How to Calculate Range Between the Dataframe Values Using Python