Convenient Way to Handle Deeply Nested Dictionary in Python

Convenient way to handle deeply nested dictionary in Python

nested_child_info = master_dictionary['sub_categories'][sub_cat_name]['attributes'][attribute_name]['special_type']['nested_children'][child_cat_name]
nested_child_info[color] = blue

nested_child_info is a reference, so changing its contents is going to change the contents of master_dictionary.

How to key into deeply nested dictionary and list structure

We started with:

d = {'d1':[1,2,{'d2':['this is tricky',{'tough':[1,2,['me']]}]}]}

Let's see how we can navigate through the data, one step at a time:

>>> d = {'d1':[1,2,{'d2':['this is tricky',{'tough':[1,2,['me']]}]}]}
>>> d['d1']
[1, 2, {'d2': ['this is tricky', {'tough': [1, 2, ['me']]}]}]
>>> d['d1'][2]
{'d2': ['this is tricky', {'tough': [1, 2, ['me']]}]}
>>> d['d1'][2]['d2']
['this is tricky', {'tough': [1, 2, ['me']]}]
>>> d['d1'][2]['d2'][1]
{'tough': [1, 2, ['me']]}
>>> d['d1'][2]['d2'][1]['tough']
[1, 2, ['me']]
>>> d['d1'][2]['d2'][1]['tough'][2]
['me']
>>> d['d1'][2]['d2'][1]['tough'][2][0]
'me'

At the end, we have the desired code for the original problem: d['d1'][2]['d2'][1]['tough'][2][0].

Flat is better than nested: how to deal with deeply nested dictionaries?

There exist several approaches to the problem:

  1. Flatten the structure. If you presume that processing of the flat table will work faster, it might make sense to flatten your structure into the plain list of class or struct instances, and then process the plain list.
  2. If the structure is uniform across levels (even if the names are different on each level), you can use simple recursion. You would have the function which will check if the item on certain level has children, and call itself with each of those children if they are present, or call the data processing function for the final level entries. If subcomponent names differ, you can have an array of such names that would say "on level 1 the items are called 'comp*', on level 2 - 'subcomp*' and so on.
  3. If levels require completely different handling, introduce separate functions for each level.

Python: Easily access deeply nested dict (get and set)

Attribute Tree

The problem with your first specification is that Python can't tell in __getitem__ if, at my_obj.a.b.c.d, you will next proceed farther down a nonexistent tree, in which case it needs to return an object with a __getitem__ method so you won't get an AttributeError thrown at you, or if you want a value, in which case it needs to return None.

I would argue that in every case you have above, you should expect it to throw a KeyError instead of returning None. The reason being that you can't tell if None means "no key" or "someone actually stored None at that location". For this behavior, all you have to do is take dotdictify, remove marker, and replace __getitem__ with:

def __getitem__(self, key):
return self[key]

Because what you really want is a dict with __getattr__ and __setattr__.

There may be a way to remove __getitem__ entirely and say something like __getattr__ = dict.__getitem__, but I think this may be over-optimization, and will be a problem if you later decide you want __getitem__ to create the tree as it goes like dotdictify originally does, in which case you would change it to:

def __getitem__(self, key):
if key not in self:
dict.__setitem__(self, key, dotdictify())
return dict.__getitem__(self, key)

I don't like the marker business in the original dotdictify.

Path Support

The second specification (override get() and set()) is that a normal dict has a get() that operates differently from what you describe and doesn't even have a set (though it has a setdefault() which is an inverse operation to get()). People expect get to take two parameters, the second being a default if the key isn't found.

If you want to extend __getitem__ and __setitem__ to handle dotted-key notation, you'll need to modify doctictify to:

class dotdictify(dict):
def __init__(self, value=None):
if value is None:
pass
elif isinstance(value, dict):
for key in value:
self.__setitem__(key, value[key])
else:
raise TypeError, 'expected dict'

def __setitem__(self, key, value):
if '.' in key:
myKey, restOfKey = key.split('.', 1)
target = self.setdefault(myKey, dotdictify())
if not isinstance(target, dotdictify):
raise KeyError, 'cannot set "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
target[restOfKey] = value
else:
if isinstance(value, dict) and not isinstance(value, dotdictify):
value = dotdictify(value)
dict.__setitem__(self, key, value)

def __getitem__(self, key):
if '.' not in key:
return dict.__getitem__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
raise KeyError, 'cannot get "%s" in "%s" (%s)' % (restOfKey, myKey, repr(target))
return target[restOfKey]

def __contains__(self, key):
if '.' not in key:
return dict.__contains__(self, key)
myKey, restOfKey = key.split('.', 1)
target = dict.__getitem__(self, myKey)
if not isinstance(target, dotdictify):
return False
return restOfKey in target

def setdefault(self, key, default):
if key not in self:
self[key] = default
return self[key]

__setattr__ = __setitem__
__getattr__ = __getitem__

Test code:

>>> life = dotdictify({'bigBang': {'stars': {'planets': {}}}})
>>> life.bigBang.stars.planets
{}
>>> life.bigBang.stars.planets.earth = { 'singleCellLife' : {} }
>>> life.bigBang.stars.planets
{'earth': {'singleCellLife': {}}}
>>> life['bigBang.stars.planets.mars.landers.vikings'] = 2
>>> life.bigBang.stars.planets.mars.landers.vikings
2
>>> 'landers.vikings' in life.bigBang.stars.planets.mars
True
>>> life.get('bigBang.stars.planets.mars.landers.spirit', True)
True
>>> life.setdefault('bigBang.stars.planets.mars.landers.opportunity', True)
True
>>> 'landers.opportunity' in life.bigBang.stars.planets.mars
True
>>> life.bigBang.stars.planets.mars
{'landers': {'opportunity': True, 'vikings': 2}}

How to clean up deeply nested dictionary access?

I would start with unit test to ensure that after each refactoring iteration my code still actually works.

I would use meaningful data structures and methods, so the code is more self-describing. Sometimes you can find namedtuple very useful for that if you don't want to roll out a separate data-holder class.

Finally I would decompose this big and ugly if...for...else block into meaningful smaller chunks, something like this:

# instead of this original code...

for pname, stats_by_year in parsed_stats.items():
if pname in self.raw_players_with_stats[tour]:
#...
elif pname in self.new_player_urls[tour]:
self.new_stats[tour][pname] = stats_by_year

# you get something like this

for player_name, stats_by_year in parser_stats.iteritems():
if self.has_raw_player(player_name):
self.process_new_raw_player(player_name, stats_by_year)
elif self.is_player_new(player_name):
self.insert_new_stat_for_player( player_name, stats_by_year )

which is easier to read, test and understand

And, if you have some free time, I would invest it in reading Clean Code by Robert Martin. It will pay off surely!

Edit

Clean up lengthy and difficult to read one-liners like this

#...
self.new_stats[tour].setdefault(pname,{}).setdefault(y,{}).setdefault(cat,{})[prop] = val
#...

So it looks, for example, like this:

def insert_new_stat(self, tour, pname, y, cat, prop, val):
player_stat = self.new_stats[tour].setdefault(pname, {})
y_param = player_stat.setdefault(y, {}) # what is y??
category_stats = ...
prop_stats = ...
... = val

Your code certainly will be more lengthy and verbose, although Explicit is better than implicit

Parse nested dictionary conveniently

You can build a simple approach, using the built-in library, as below:

from functools import reduce, partial
from operator import getitem

nested_get = partial(reduce, getitem)


x = {'a': {'b': {'c': {'d': 1}}}}

item = nested_get(["a", "b", "c", "d"], x)
print(item)

Output

1

UPDATE

To emulate the behavior of dict.get, use:

def nested_get(path, d, default=None):
current = d
for key in path:
try:
current = current[key]
except KeyError:
return default
return current


x = {'a': {'b': {'c': {'d': 1}}}}

item = nested_get(["a", "b", "c", "d"], x)
print(item)
item = nested_get(["a", "b", "e", "d"], x)
print(item)

Output

1
None

python: what are efficient techniques to deal with deeply nested data in a flexible manner?

"I stored it in a deeply nested dictionary"

And, as you've seen, it doesn't work out well.

What's the alternative?

  1. Composite keys and a shallow dictionary. You have an 8-part key:
    ( individual, imaging session, Region imaged, timestamp of file, properties of file, regions of interest in image, format of data, channel of acquisition ) which maps
    to an array of values.

    { ('AS091209M02', '100113', 'R16', '1263399103', 'Responses', 'N01', 'Sequential', 'Ch1' ): array, 
    ...

    The issue with this is search.

  2. Proper class structures. Actually, a full-up Class definition may be overkill.

"The type of operations I perform is for instance to compute properties of the arrays
(listed under Ch1, Ch2), pick up arrays to make a new collection, for instance analyze
responses of N01 from region 16 (R16) of a given individual at different time points, etc."

Recommendation

First, use a namedtuple for your ultimate object.

Array = namedtuple( 'Array', 'individual, session, region, timestamp, properties, roi, format, channel, data' )

Or something like that. Build a simple list of these named tuple objects. You can then simply iterate over them.

Second, use many simple map-reduce operations on this master list of the array objects.

Filtering:

for a in theMasterArrrayList:
if a.region = 'R16' and interest = 'N01':
# do something on these items only.

Reducing by Common Key:

individual_dict = defaultdict(list)
for a in theMasterArrayList:
individual_dict[ a.individual ].append( a )

This will create a subset in the map that has exactly the items you want.

You can then do indiidual_dict['AS091209M02'] and have all of their data. You can do this for any (or all) of the available keys.

region_dict = defaultdict(list)
for a in theMasterArrayList:
region_dict[ a.region ].append( a )

This does not copy any data. It's fast and relatively compact in memory.

Mapping (or transforming) the array:

for a in theMasterArrayList:
someTransformationFunction( a.data )

If the array is itself a list, you're can update that list without breaking the tuple as a whole. If you need to create a new array from an existing array, you're creating a new tuple. There's nothing wrong with this, but it is a new tuple. You wind up with programs like this.

def region_filter( array_list, region_set ):
for a in array_list:
if a.region in region_set:
yield a

def array_map( array_list, someConstant ):
for a in array_list:
yield Array( *(a[:8] + (someTranformation( a.data, someConstant ),) )

def some_result( array_list, region, someConstant ):
for a in array_map( region_filter( array_list, region ), someConstant ):
yield a

You can build up transformations, reductions, mappings into more elaborate things.

The most important thing is creating only the dictionaries you need from the master list so you don't do any more filtering than is minimally necessary.

BTW. This can be mapped to a relational database trivially. It will be slower, but you can have multiple concurrent update operations. Except for multiple concurrent updates, a relational database doesn't offer any features above this.

Find a key inside a deeply nested dictionary

@Håvard's recursive solution is probably going to be OK... unless the level of nesting is too high, and then you get a RuntimeError: maximum recursion depth exceeded. To remedy that, you can use the usual technique for recursion removal: keep your own stack of items to examine (as a list that's under your control). I.e.:

def find_key_nonrecursive(adict, key):
stack = [adict]
while stack:
d = stack.pop()
if key in d:
return d[key]
for k, v in d.iteritems():
if isinstance(v, dict):
stack.append(v)

The logic here is quite close to the recursive answer (except for checking for dict in the right way;-), with the obvious exception that the recursive calls are replaced with a while loop and .pop and .append operations on the explicit-stack list, stack.



Related Topics



Leave a reply



Submit