Functions That Help to Understand JSON(Dict) Structure

Functions that help to understand json(dict) structure

Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it's a generator it will find all matching keys if the object contains multiple matching keys, if desired.

def find_key(obj, key):
if isinstance(obj, dict):
yield from iter_dict(obj, key, [])
elif isinstance(obj, list):
yield from iter_list(obj, key, [])

def iter_dict(d, key, indices):
for k, v in d.items():
if k == key:
yield indices + [k], v
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])

def iter_list(seq, key, indices):
for k, v in enumerate(seq):
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])

# test

data = {
'1_data': {
'4_data': [
{'5_data': 'hooray'},
{'3_data': 'hooray2'}
],
'2_data': []
}
}

for t in find_key(data, '3_data'):
print(t)

output

(['1_data', '4_data', 1, '3_data'], 'hooray2')

To get a single key list you can pass find_key to the next function. And if you want to use a key list to fetch the associated value you can use a simple for loop.

seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)

obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)

output

seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True

If the key may be missing, then give next an appropriate default tuple. Eg:

seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)

output

seq: [] val: None

Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from statements, eg replace

yield from iter_dict(obj, key, [])

with

for u in iter_dict(obj, key, []):
yield u

How it works

To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.

The Python object returned by json.load or json.loads is generally a dict, but it can also be a list. We pass that object to the find_key generator as the obj arg, along with the key string that we want to locate. find_key then calls either iter_dict or iter_list, as appropriate, passing them the object, the key, and an empty list indices, which is used to collect the dict keys and list indices that lead to the key we want.

iter_dict iterates over each (k, v) pair at the top level of its d dict arg. If k matches the key we're looking for then the current indices list is yielded with k appended to it, along with the associated value. Because iter_dict is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key and then to the code that called find_key. Note that this is the "base case" of our recursion: it's the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we're looking for then that recursion path won't add anything to indices and it will terminate without yielding anything.

If the current v is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict, passing that v is its starting object and the current indices list. If the current v is a list we instead call iter_list, passing it the same args.

iter_list works similarly to iter_dict except that a list doesn't have any keys, it only contains values, so we don't perform the k == key test, we just recurse into any dicts or lists that the original list contains.

The end result of this process is that when we iterate over find_key we get pairs of (indices, value) where each indices list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value is the value associated with that particular key.

If you'd like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.

Also take look at my new, more streamlined show_indices function.

How can I select deeply nested key:values from dictionary in python

I suggest you to use python-benedict, a solid python dict subclass with full keypath support and many utility methods.

It provides IO support with many formats, including json.

You can initialize it directly from the json file:

from benedict import benedict

d = benedict.from_json('data.json')

Now your dict has keypath support:

print(d['payload.metadata.coverImage.id'])

# or use get to avoid a possible KeyError
print(d.get('payload.metadata.coverImage.id'))

Installation: pip install python-benedict

Here the library repository and the documentation:
https://github.com/fabiocaccamo/python-benedict

Note: I am the author of this project

Search nested json / dict for multiple key values matching specified keys

The example function below searches a dict (including all nested dicts) for key / value pairs matching a list of keys you would like to find. This function recursively loops through the dict and any nested dicts and lists it contains to build a list of all possible dicts to be checked for matching keys.

def find_key_value_pairs(q, keys, dicts=None):
if not dicts:
dicts = [q]
q = [q]

data = q.pop(0)
if isinstance(data, dict):
data = data.values()

for d in data:
dtype = type(d)
if dtype is dict or dtype is list:
q.append(d)
if dtype is dict:
dicts.append(d)

if q:
return find_key_value_pairs(q, keys, dicts)

return [(k, v) for d in dicts for k, v in d.items() if k in keys]

Example below uses json.loads to convert an example dataset similar to your json to a dict before passing it to the function.

import json

json_data = """
{"results_count": 2, "results": [{"utc_start_at": "2018-09-29T16:45:00+0000", "counts": {"customer_count": "14", "other_count": "41"}, "capacity": {"non-resource": {"non_resource_bookable_capacity": "18", "other_non_resource_capacity": "1"}, "resource_capacity": "10"}}, {"utc_start_at": "2018-10-29T15:15:00+0000", "counts": {"customer_count": "7", "other_count": "41"}, "capacity": {"non-resource": {"non_resource_bookable_capacity": "25", "other_non_resource_capacity": "1"}, "resource_capacity": "10"}}]}
"""
data = json.loads(json_data) # json_data is a placeholder for your json
keys = ['results_count', 'customer_count', 'utc_start_at', 'non_resource_bookable_capacity']
results = find_key_value_pairs(data, keys)
for k, v in results:
print(f'{k}: {v}')
# results_count: 2
# utc_start_at: 2018-09-29T16:45:00+0000
# utc_start_at: 2018-10-29T15:15:00+0000
# customer_count: 14
# customer_count: 7
# non_resource_bookable_capacity: 18
# non_resource_bookable_capacity: 25

Storing Python dictionaries

Pickle save:

try:
import cPickle as pickle
except ImportError: # Python 3.x
import pickle

with open('data.p', 'wb') as fp:
pickle.dump(data, fp, protocol=pickle.HIGHEST_PROTOCOL)

See the pickle module documentation for additional information regarding the protocol argument.

Pickle load:

with open('data.p', 'rb') as fp:
data = pickle.load(fp)

JSON save:

import json

with open('data.json', 'w') as fp:
json.dump(data, fp)

Supply extra arguments, like sort_keys or indent, to get a pretty result. The argument sort_keys will sort the keys alphabetically and indent will indent your data structure with indent=N spaces.

json.dump(data, fp, sort_keys=True, indent=4)

JSON load:

with open('data.json', 'r') as fp:
data = json.load(fp)

Encoding python dictionary into JSON using a schema

Create the structure you want using a dictionary comprehension before calling json.dump:

output = {"devices": [
{"device": {"deviceType": k,
"deviceBrands": [{"deviceBrand": {"deviceBrandName": k1,
"deviceBrandCount": v1}
} for k1, v1 in v.items()
]
}
}
for k,v in d.items()]}

with open("output.json","w") as f:
json.dump(output,f)
output.json:

{
"devices": [
{
"device": {
"deviceType": "Laptop",
"deviceBrands": [
{
"deviceBrand": {
"deviceBrandName": "sony",
"deviceBrandCount": 1
}
},
{
"deviceBrand": {
"deviceBrandName": "apple",
"deviceBrandCount": 2
}
},
{
"deviceBrand": {
"deviceBrandName": "asus",
"deviceBrandCount": 5
}
}
]
}
},
{
"device": {
"deviceType": "Camera",
"deviceBrands": [
{
"deviceBrand": {
"deviceBrandName": "sony",
"deviceBrandCount": 2
}
},
{
"deviceBrand": {
"deviceBrandName": "sumsung",
"deviceBrandCount": 1
}
},
{
"deviceBrand": {
"deviceBrandName": "nikon",
"deviceBrandCount": 4
}
}
]
}
}
]
}

Parsing JSON nested Dictionary using Python

To understand how your json is set up, it's easier to break it down. Let's look at the first dictionary's keys, and remove the values.

json = {"items": [], "links": {}}

You have a dictionary with two keys and two values. All three of the variables you are looking for (id, self, name) are in the first key, "items". So let's dive deeper.

json["items"] = [{'links': {'self': 'https://www.google.com'}, 'name': 'beast', 'type': 'Device', 'id': '12345'}]

Now you have a list containing a dictionary with the values you are looking for, so let's enter the first and only value of the list containing the next dictionary.

json["items"][0] = {'links': {'self': 'https://www.google.com'}, 'id': '12345', 'type': 'Device', 'name': 'beast'}

Finally we have the dictionary with the values are looking for, so you can use this code to find name and id.

json["items"][0]["name"] = beast

json["items"][0]["id"] = 12345

The self variable is hidden one dictionary deeper so we have to delve into the links key.

json["items"][0]["links"]["self"] = http://google.com

Now you have all of your values, you just need to follow through all the lists and dictionaries to get the value you want.

Passing dictionary as parameter to a function

Here is a one-liner to map the data to a schema if you can change the schema, you could also just go and grab the keys instead of creating a list of items to match. This formats the data to the schema based on matching keys:

EDIT: added 'Data' tag to the schema and output for nested list data

schema = {
'Global_parameters': [
'clock_frequency', # I noticed you had this as just 'clock' in your desired outuput
'Triggering_Mode'
],
'Executor_param': [
'Mode'
],
'Waveform_Settings': [
'overshoot',
'duty_cycle',
'amplitude/high_level',
'offset/low_level'
],
'Data': {
'Packet'
}
}

data = {
"clock_frequency": 25000,
"Triggering_Mode": "positive_edge_triggered",
"Mode": "Offline",
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0,
"Packet": [
{"time_index":0.1, "data":0x110},
{"time_index":1.21, "data":123},
{"time_index":2.0, "data": 0x45}
]
}

# "one line" nested dict comprehension
data_structured = {k0: {k1: v1 for k1, v1 in data.items() if k1 in v0} # in v0.keys() if you are using the structure you have above
for k0, v0 in schema.items()}

import json
print(json.dumps(data_structured, indent=4)) # pretty print in json format

Output:

{
"Global_parameters": {
"clock_frequency": 25000,
"Triggering_Mode": "positive_edge_triggered"
},
"Executor_param": {
"Mode": "Offline"
},
"Waveform_Settings": {
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0
},
"Data": {
"Packet": [
{
"time_index": 0.1,
"data": 272
},
{
"time_index": 1.21,
"data": 123
},
{
"time_index": 2.0,
"data": 69
}
]
}
}


Related Topics



Leave a reply



Submit