Functions that help to understand json(dict) structure
Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key
yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it's a generator it will find all matching keys if the object contains multiple matching keys, if desired.
def find_key(obj, key):
if isinstance(obj, dict):
yield from iter_dict(obj, key, [])
elif isinstance(obj, list):
yield from iter_list(obj, key, [])
def iter_dict(d, key, indices):
for k, v in d.items():
if k == key:
yield indices + [k], v
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
def iter_list(seq, key, indices):
for k, v in enumerate(seq):
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
# test
data = {
'1_data': {
'4_data': [
{'5_data': 'hooray'},
{'3_data': 'hooray2'}
],
'2_data': []
}
}
for t in find_key(data, '3_data'):
print(t)
output
(['1_data', '4_data', 1, '3_data'], 'hooray2')
To get a single key list you can pass find_key
to the next
function. And if you want to use a key list to fetch the associated value you can use a simple for
loop.
seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True
If the key may be missing, then give next
an appropriate default tuple. Eg:
seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: [] val: None
Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from
statements, eg replace
yield from iter_dict(obj, key, [])
with
for u in iter_dict(obj, key, []):
yield u
How it works
To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.
The Python object returned by json.load
or json.loads
is generally a dict, but it can also be a list. We pass that object to the find_key
generator as the obj
arg, along with the key
string that we want to locate. find_key
then calls either iter_dict
or iter_list
, as appropriate, passing them the object, the key, and an empty list indices
, which is used to collect the dict keys and list indices that lead to the key we want.
iter_dict
iterates over each (k, v) pair at the top level of its d
dict arg. If k
matches the key we're looking for then the current indices
list is yielded with k
appended to it, along with the associated value. Because iter_dict
is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key
and then to the code that called find_key
. Note that this is the "base case" of our recursion: it's the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we're looking for then that recursion path won't add anything to indices
and it will terminate without yielding anything.
If the current v
is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict
, passing that v
is its starting object and the current indices
list. If the current v
is a list we instead call iter_list
, passing it the same args.
iter_list
works similarly to iter_dict
except that a list doesn't have any keys, it only contains values, so we don't perform the k == key
test, we just recurse into any dicts or lists that the original list contains.
The end result of this process is that when we iterate over find_key
we get pairs of (indices, value) where each indices
list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value
is the value associated with that particular key.
If you'd like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.
Also take look at my new, more streamlined show_indices
function.
How can I select deeply nested key:values from dictionary in python
I suggest you to use python-benedict
, a solid python dict subclass with full keypath support and many utility methods.
It provides IO support with many formats, including json
.
You can initialize it directly from the json
file:
from benedict import benedict
d = benedict.from_json('data.json')
Now your dict has keypath support:
print(d['payload.metadata.coverImage.id'])
# or use get to avoid a possible KeyError
print(d.get('payload.metadata.coverImage.id'))
Installation: pip install python-benedict
Here the library repository and the documentation:
https://github.com/fabiocaccamo/python-benedict
Note: I am the author of this project
Search nested json / dict for multiple key values matching specified keys
The example function below searches a dict (including all nested dicts) for key / value pairs matching a list of keys you would like to find. This function recursively loops through the dict and any nested dicts and lists it contains to build a list of all possible dicts to be checked for matching keys.
def find_key_value_pairs(q, keys, dicts=None):
if not dicts:
dicts = [q]
q = [q]
data = q.pop(0)
if isinstance(data, dict):
data = data.values()
for d in data:
dtype = type(d)
if dtype is dict or dtype is list:
q.append(d)
if dtype is dict:
dicts.append(d)
if q:
return find_key_value_pairs(q, keys, dicts)
return [(k, v) for d in dicts for k, v in d.items() if k in keys]
Example below uses json.loads
to convert an example dataset similar to your json to a dict before passing it to the function.
import json
json_data = """
{"results_count": 2, "results": [{"utc_start_at": "2018-09-29T16:45:00+0000", "counts": {"customer_count": "14", "other_count": "41"}, "capacity": {"non-resource": {"non_resource_bookable_capacity": "18", "other_non_resource_capacity": "1"}, "resource_capacity": "10"}}, {"utc_start_at": "2018-10-29T15:15:00+0000", "counts": {"customer_count": "7", "other_count": "41"}, "capacity": {"non-resource": {"non_resource_bookable_capacity": "25", "other_non_resource_capacity": "1"}, "resource_capacity": "10"}}]}
"""
data = json.loads(json_data) # json_data is a placeholder for your json
keys = ['results_count', 'customer_count', 'utc_start_at', 'non_resource_bookable_capacity']
results = find_key_value_pairs(data, keys)
for k, v in results:
print(f'{k}: {v}')
# results_count: 2
# utc_start_at: 2018-09-29T16:45:00+0000
# utc_start_at: 2018-10-29T15:15:00+0000
# customer_count: 14
# customer_count: 7
# non_resource_bookable_capacity: 18
# non_resource_bookable_capacity: 25
Storing Python dictionaries
Pickle save:
try:
import cPickle as pickle
except ImportError: # Python 3.x
import pickle
with open('data.p', 'wb') as fp:
pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)
See the pickle module documentation for additional information regarding the protocol
argument.
Pickle load:
with open('data.p', 'rb') as fp:
data = pickle.load(fp)
JSON save:
import json
with open('data.json', 'w') as fp:
json.dump(data, fp)
Supply extra arguments, like sort_keys
or indent
, to get a pretty result. The argument sort_keys will sort the keys alphabetically and indent will indent your data structure with indent=N
spaces.
json.dump(data, fp, sort_keys=True, indent=4)
JSON load:
with open('data.json', 'r') as fp:
data = json.load(fp)
Encoding python dictionary into JSON using a schema
Create the structure you want using a dictionary comprehension before calling json.dump
:
output = {"devices": [
{"device": {"deviceType": k,
"deviceBrands": [{"deviceBrand": {"deviceBrandName": k1,
"deviceBrandCount": v1}
} for k1, v1 in v.items()
]
}
}
for k,v in d.items()]}
with open("output.json","w") as f:
json.dump(output,f)
output.json:
{
"devices": [
{
"device": {
"deviceType": "Laptop",
"deviceBrands": [
{
"deviceBrand": {
"deviceBrandName": "sony",
"deviceBrandCount": 1
}
},
{
"deviceBrand": {
"deviceBrandName": "apple",
"deviceBrandCount": 2
}
},
{
"deviceBrand": {
"deviceBrandName": "asus",
"deviceBrandCount": 5
}
}
]
}
},
{
"device": {
"deviceType": "Camera",
"deviceBrands": [
{
"deviceBrand": {
"deviceBrandName": "sony",
"deviceBrandCount": 2
}
},
{
"deviceBrand": {
"deviceBrandName": "sumsung",
"deviceBrandCount": 1
}
},
{
"deviceBrand": {
"deviceBrandName": "nikon",
"deviceBrandCount": 4
}
}
]
}
}
]
}
Parsing JSON nested Dictionary using Python
To understand how your json is set up, it's easier to break it down. Let's look at the first dictionary's keys, and remove the values.
json = {"items": [], "links": {}}
You have a dictionary with two keys and two values. All three of the variables you are looking for (id, self, name) are in the first key, "items". So let's dive deeper.
json["items"] = [{'links': {'self': 'https://www.google.com'}, 'name': 'beast', 'type': 'Device', 'id': '12345'}]
Now you have a list containing a dictionary with the values you are looking for, so let's enter the first and only value of the list containing the next dictionary.
json["items"][0] = {'links': {'self': 'https://www.google.com'}, 'id': '12345', 'type': 'Device', 'name': 'beast'}
Finally we have the dictionary with the values are looking for, so you can use this code to find name and id.
json["items"][0]["name"] = beast
json["items"][0]["id"] = 12345
The self variable is hidden one dictionary deeper so we have to delve into the links key.
json["items"][0]["links"]["self"] = http://google.com
Now you have all of your values, you just need to follow through all the lists and dictionaries to get the value you want.
Passing dictionary as parameter to a function
Here is a one-liner to map the data to a schema if you can change the schema, you could also just go and grab the keys instead of creating a list of items to match. This formats the data to the schema based on matching keys:
EDIT: added 'Data' tag to the schema and output for nested list data
schema = {
'Global_parameters': [
'clock_frequency', # I noticed you had this as just 'clock' in your desired outuput
'Triggering_Mode'
],
'Executor_param': [
'Mode'
],
'Waveform_Settings': [
'overshoot',
'duty_cycle',
'amplitude/high_level',
'offset/low_level'
],
'Data': {
'Packet'
}
}
data = {
"clock_frequency": 25000,
"Triggering_Mode": "positive_edge_triggered",
"Mode": "Offline",
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0,
"Packet": [
{"time_index":0.1, "data":0x110},
{"time_index":1.21, "data":123},
{"time_index":2.0, "data": 0x45}
]
}
# "one line" nested dict comprehension
data_structured = {k0: {k1: v1 for k1, v1 in data.items() if k1 in v0} # in v0.keys() if you are using the structure you have above
for k0, v0 in schema.items()}
import json
print(json.dumps(data_structured, indent=4)) # pretty print in json format
Output:
{
"Global_parameters": {
"clock_frequency": 25000,
"Triggering_Mode": "positive_edge_triggered"
},
"Executor_param": {
"Mode": "Offline"
},
"Waveform_Settings": {
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0
},
"Data": {
"Packet": [
{
"time_index": 0.1,
"data": 272
},
{
"time_index": 1.21,
"data": 123
},
{
"time_index": 2.0,
"data": 69
}
]
}
}
Related Topics
Downloading with Chrome Headless and Selenium
Getting Rid of Console Output When Freezing Python Programs Using Pyinstaller
How to Write Binary Data to Stdout in Python 3
Parsing a JSON String Which Was Loaded from a CSV Using Pandas
Script Using Multiprocessing Module Does Not Terminate
Executing Command Using Paramiko Exec_Command on Device Is Not Working
Testing Floating Point Equality
Typeerror: Str Does Not Support Buffer Interface
Clicking on Svg Using Selenium Python
Running Get_Dummies on Several Dataframe Columns
Failed to Catch Syntax Error Python
Valueerror: Numpy.Dtype Has the Wrong Size, Try Recompiling
Boto3 to Download All Files from a S3 Bucket
How to Extract Parameters from a List and Pass Them to a Function Call
Solving "Dll Load Failed: %1 Is Not a Valid Win32 Application." for Pygame
Pandas Group by and Find First Non Null Value for All Columns