Convenient way to handle deeply nested dictionary in Python
nested_child_info = master_dictionary['sub_categories'][sub_cat_name]['attributes'][attribute_name]['special_type']['nested_children'][child_cat_name]
nested_child_info[color] = blue
nested_child_info
is a reference, so changing its contents is going to change the contents of master_dictionary
.
How to key into deeply nested dictionary and list structure
We started with:
d = {'d1':[1,2,{'d2':['this is tricky',{'tough':[1,2,['me']]}]}]}
Let's see how we can navigate through the data, one step at a time:
>>> d = {'d1':[1,2,{'d2':['this is tricky',{'tough':[1,2,['me']]}]}]}
>>> d['d1']
[1, 2, {'d2': ['this is tricky', {'tough': [1, 2, ['me']]}]}]
>>> d['d1'][2]
{'d2': ['this is tricky', {'tough': [1, 2, ['me']]}]}
>>> d['d1'][2]['d2']
['this is tricky', {'tough': [1, 2, ['me']]}]
>>> d['d1'][2]['d2'][1]
{'tough': [1, 2, ['me']]}
>>> d['d1'][2]['d2'][1]['tough']
[1, 2, ['me']]
>>> d['d1'][2]['d2'][1]['tough'][2]
['me']
>>> d['d1'][2]['d2'][1]['tough'][2][0]
'me'
At the end, we have the desired code for the original problem: d['d1'][2]['d2'][1]['tough'][2][0]
.
Flat is better than nested: how to deal with deeply nested dictionaries?
There exist several approaches to the problem:
- Flatten the structure. If you presume that processing of the flat table will work faster, it might make sense to flatten your structure into the plain list of class or struct instances, and then process the plain list.
- If the structure is uniform across levels (even if the names are different on each level), you can use simple recursion. You would have the function which will check if the item on certain level has children, and call itself with each of those children if they are present, or call the data processing function for the final level entries. If subcomponent names differ, you can have an array of such names that would say "on level 1 the items are called 'comp*', on level 2 - 'subcomp*' and so on.
- If levels require completely different handling, introduce separate functions for each level.
Python: Easily access deeply nested dict (get and set)
Attribute Tree
The problem with your first specification is that Python can't tell in __getitem__
if, at my_obj.a.b.c.d
, you will next proceed farther down a nonexistent tree, in which case it needs to return an object with a __getitem__
method so you won't get an AttributeError
thrown at you, or if you want a value, in which case it needs to return None
.
I would argue that in every case you have above, you should expect it to throw a KeyError
instead of returning None
. The reason being that you can't tell if None
means "no key" or "someone actually stored None
at that location". For this behavior, all you have to do is take dotdictify
, remove marker
, and replace __getitem__
with:
def __getitem__(self, key):
return self[key]
Because what you really want is a dict
with __getattr__
and __setattr__
.
There may be a way to remove __getitem__
entirely and say something like __getattr__ = dict.__getitem__
, but I think this may be over-optimization, and will be a problem if you later decide you want __getitem__
to create the tree as it goes like dotdictify
originally does, in which case you would change it to:
def __getitem__(self, key):
if key not in self:
dict.__setitem__(self, key, dotdictify())
return dict.__getitem__(self, key)
I don't like the marker
business in the original dotdictify
.
Path Support
The second specification (override get()
and set()
) is that a normal dict
has a get()
that operates differently from what you describe and doesn't even have a set
(though it has a setdefault()
which is an inverse operation to get()
). People expect get
to take two parameters, the second being a default if the key isn't found.
If you want to extend __getitem__
and __setitem__
to handle dotted-key notation, you'll need to modify doctictify
to:
class dotdictify(dict):
def __init__(self, value=None):
if value is None:
pass
elif isinstance(value, dict):
for key in value:
self.__setitem__(key, value[key])
else:
raise TypeError, 'expected dict'
def __setitem__(self, key, value):
if '.' in key:
myKey, restOfKey = key.split('.', 1)
target = self.setdefault(myKey, dotdictify())
if not isinstance(target, dotdictify):
raise KeyError, 'cannot set "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
target[restOfKey] = value
else:
if isinstance(value, dict) and not isinstance(value, dotdictify):
value = dotdictify(value)
dict.__setitem__(self, key, value)
def __getitem__(self, key):
if '.' not in key:
return dict.__getitem__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
raise KeyError, 'cannot get "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
return target[restOfKey]
def __contains__(self, key):
if '.' not in key:
return dict.__contains__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
return False
return restOfKey in target
def setdefault(self, key, default):
if key not in self:
self[key] = default
return self[key]
__setattr__ = __setitem__
__getattr__ = __getitem__
Test code:
>>> life = dotdictify({'bigBang': {'stars': {'planets': {}}}})
>>> life.bigBang.stars.planets
{}
>>> life.bigBang.stars.planets.earth = { 'singleCellLife' : {} }
>>> life.bigBang.stars.planets
{'earth': {'singleCellLife': {}}}
>>> life['bigBang.stars.planets.mars.landers.vikings'] = 2
>>> life.bigBang.stars.planets.mars.landers.vikings
2
>>> 'landers.vikings' in life.bigBang.stars.planets.mars
True
>>> life.get('bigBang.stars.planets.mars.landers.spirit', True)
True
>>> life.setdefault('bigBang.stars.planets.mars.landers.opportunity', True)
True
>>> 'landers.opportunity' in life.bigBang.stars.planets.mars
True
>>> life.bigBang.stars.planets.mars
{'landers': {'opportunity': True, 'vikings': 2}}
How to clean up deeply nested dictionary access?
I would start with unit test to ensure that after each refactoring iteration my code still actually works.
I would use meaningful data structures and methods, so the code is more self-describing. Sometimes you can find namedtuple very useful for that if you don't want to roll out a separate data-holder class.
Finally I would decompose this big and ugly if...for...else
block into meaningful smaller chunks, something like this:
# instead of this original code...
for pname, stats_by_year in parsed_stats.items():
if pname in self.raw_players_with_stats[tour]:
#...
elif pname in self.new_player_urls[tour]:
self.new_stats[tour][pname] = stats_by_year
# you get something like this
for player_name, stats_by_year in parser_stats.iteritems():
if self.has_raw_player(player_name):
self.process_new_raw_player(player_name, stats_by_year)
elif self.is_player_new(player_name):
self.insert_new_stat_for_player( player_name, stats_by_year )
which is easier to read, test and understand
And, if you have some free time, I would invest it in reading Clean Code by Robert Martin. It will pay off surely!
Edit
Clean up lengthy and difficult to read one-liners like this
#...
self.new_stats[tour].setdefault(pname,{}).setdefault(y,{}).setdefault(cat,{})[prop] = val
#...
So it looks, for example, like this:
def insert_new_stat(self, tour, pname, y, cat, prop, val):
player_stat = self.new_stats[tour].setdefault(pname, {})
y_param = player_stat.setdefault(y, {}) # what is y??
category_stats = ...
prop_stats = ...
... = val
Your code certainly will be more lengthy and verbose, although Explicit is better than implicit
Parse nested dictionary conveniently
You can build a simple approach, using the built-in library, as below:
from functools import reduce, partial
from operator import getitem
nested_get = partial(reduce, getitem)
x = {'a': {'b': {'c': {'d': 1}}}}
item = nested_get(["a", "b", "c", "d"], x)
print(item)
Output
1
UPDATE
To emulate the behavior of dict.get
, use:
def nested_get(path, d, default=None):
current = d
for key in path:
try:
current = current[key]
except KeyError:
return default
return current
x = {'a': {'b': {'c': {'d': 1}}}}
item = nested_get(["a", "b", "c", "d"], x)
print(item)
item = nested_get(["a", "b", "e", "d"], x)
print(item)
Output
1
None
python: what are efficient techniques to deal with deeply nested data in a flexible manner?
"I stored it in a deeply nested dictionary"
And, as you've seen, it doesn't work out well.
What's the alternative?
Composite keys and a shallow dictionary. You have an 8-part key:
( individual, imaging session, Region imaged, timestamp of file, properties of file, regions of interest in image, format of data, channel of acquisition ) which maps
to an array of values.{ ('AS091209M02', '100113', 'R16', '1263399103', 'Responses', 'N01', 'Sequential', 'Ch1' ): array,
...The issue with this is search.
Proper class structures. Actually, a full-up Class definition may be overkill.
"The type of operations I perform is for instance to compute properties of the arrays
(listed under Ch1, Ch2), pick up arrays to make a new collection, for instance analyze
responses of N01 from region 16 (R16) of a given individual at different time points, etc."
Recommendation
First, use a namedtuple
for your ultimate object.
Array = namedtuple( 'Array', 'individual, session, region, timestamp, properties, roi, format, channel, data' )
Or something like that. Build a simple list of these named tuple objects. You can then simply iterate over them.
Second, use many simple map-reduce operations on this master list of the array objects.
Filtering:
for a in theMasterArrrayList:
if a.region = 'R16' and interest = 'N01':
# do something on these items only.
Reducing by Common Key:
individual_dict = defaultdict(list)
for a in theMasterArrayList:
individual_dict[ a.individual ].append( a )
This will create a subset in the map that has exactly the items you want.
You can then do indiidual_dict['AS091209M02'] and have all of their data. You can do this for any (or all) of the available keys.
region_dict = defaultdict(list)
for a in theMasterArrayList:
region_dict[ a.region ].append( a )
This does not copy any data. It's fast and relatively compact in memory.
Mapping (or transforming) the array:
for a in theMasterArrayList:
someTransformationFunction( a.data )
If the array is itself a list, you're can update that list without breaking the tuple as a whole. If you need to create a new array from an existing array, you're creating a new tuple. There's nothing wrong with this, but it is a new tuple. You wind up with programs like this.
def region_filter( array_list, region_set ):
for a in array_list:
if a.region in region_set:
yield a
def array_map( array_list, someConstant ):
for a in array_list:
yield Array( *(a[:8] + (someTranformation( a.data, someConstant ),) )
def some_result( array_list, region, someConstant ):
for a in array_map( region_filter( array_list, region ), someConstant ):
yield a
You can build up transformations, reductions, mappings into more elaborate things.
The most important thing is creating only the dictionaries you need from the master list so you don't do any more filtering than is minimally necessary.
BTW. This can be mapped to a relational database trivially. It will be slower, but you can have multiple concurrent update operations. Except for multiple concurrent updates, a relational database doesn't offer any features above this.
Find a key inside a deeply nested dictionary
@Håvard's recursive solution is probably going to be OK... unless the level of nesting is too high, and then you get a RuntimeError: maximum recursion depth exceeded
. To remedy that, you can use the usual technique for recursion removal: keep your own stack of items to examine (as a list that's under your control). I.e.:
def find_key_nonrecursive(adict, key):
stack = [adict]
while stack:
d = stack.pop()
if key in d:
return d[key]
for k, v in d.iteritems():
if isinstance(v, dict):
stack.append(v)
The logic here is quite close to the recursive answer (except for checking for dict
in the right way;-), with the obvious exception that the recursive calls are replaced with a while
loop and .pop
and .append
operations on the explicit-stack list, stack
.
Related Topics
How to Track the Number of Times a Function Is Called
Robot Framework Using Python, Key Press Without Selecting Any Button or Element in the Page
Replacing Pandas or Numpy Nan With a None to Use With Mysqldb
How to Write a Lambda Function That Is Conditional on Two Variables (Columns) in Python
How to Install Tesseract for Python on Anaconda
How to Split a CSV File Row to Columns in Python
How to Skip Blank Line While Reading CSV File Using Python
Converting Json into Newline Delimited Json in Python
Python3: Remove a Substring Between Two Delimiting Char
Replace Empty Strings With None/Null Values in Dataframe
Python - How to Sort a List of Alpha and Numeric Values
Json.Loads() Decodes Only With Raw String Literal
Image.Open() Cannot Identify Image File - Python
How to Save a Numpy Array as a 16-Bit Image Using "Normal" (Enthought) Python
Permission System for Discord.Py Bot
Find Similar List Value Inside Dictionary
How to Run a Function Multiple Times and Return Different Result Python
Creating a List of Random Numbers Without Duplicates in Python