Flatten Nested Dictionaries, Compressing Keys

Unnest dictionary with nested Dictionary or list of dictionaries

Given that:

  • you don't care that both {'a': {'b': 1, 'c': 2}} and {'a': [{'b': 1}, {'c': 2}]} will map to the same {'a.b': 1, 'a.c': 2}
  • there won't be anything other than dictionaries in the lists in the input data structure

This seems like a fairly clean solution:

def _compound_key_value(xs, prefix):
if isinstance(xs, list):
for x in xs:
yield from _compound_key_value(x, prefix)
elif isinstance(xs, dict):
for k, v in xs.items():
for p, r in _compound_key_value(v, prefix):
yield prefix + (k,) + p, r
else:
yield prefix, xs


def flatten_dict_list(dl):
return {'.'.join(k): v for k, v in _compound_key_value(dl, ())}


print(flatten_dict_list({'a': 1, 'c': [{'a': 2, 'b': {'x': 5, 'y' : 10}}, {'test': 9999}, [{'s': 2}, {'t': 100}]], 'd': [1, 2, 3]}))

Output

{'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c.s': 2, 'c.t': 100, 'd': 3}

Note that it is recursive, like the solution you started out with, so I also assumed maximum recursion depth would not become an issue.

A few follow-up questions from the comments:

  1. Difference between yield and yield from?

    yield <value> directly makes a generator yield that value. As you've probably read by now, when a value is requested from it, a generator runs until it has a value to yield, which it then yields and pauses, and continues running when the next value is needed, etc. yield from <some other generator> makes the generator request a value from another generator and it then immediately yields that value and it keeps yielding from it (one at a time) until there is nothing left, and only then continues with the rest of the code, to the next yield.

    In the solution, _compound_key_value(x, prefix) starts a new generator recursively, which will start yielding values, which are yielded one by one using yield from. If the code had been yield _compound_key_value(x, prefix) (without the from), it would have yielded the generator itself, instead of values from it - that can be useful as well, but not here.

    The same could be achieved with for a in _compound_key_value(x, prefix): yield a, except that would be slower, because yield from has the new generator yield directly to the caller of this generator, without intermediate steps; and it is easier to read.

    TL;DR: yield x yields x, yield from x only works if x is a generator itself and yields everything from x one at a time.

  2. Why are you passing an empty tuple as prefix?

    To avoid having to check if prefix has some value at all, it needs an initial value and I chose to pass the empty tuple instead of setting the default to the empty tuple in the signature like this:
    def _compound_key_value(xs, prefix=()):

    Either works, but I felt no default looked cleaner and since _compound_key_value is an internal function, not intended for direct use outside functions like flatten_dict_list, the requirement to pass an empty tuple when calling it seemed reasonable.

    TL;DR: as a default, prefix=() would have also worked.

  3. What is this doing? (k, )

    It is part of one statement: yield prefix + (k,) + p, r this yields a tuple of two values, the first being prefix + (k,) + p. prefix is the function parameter which expects a tuple, and p is also a tuple, since it's the first half of the tuple returned by the recursive call. If you add three tuples together, the result is a new tuple with all the parts combined in order, so (k,) takes a key as obtained from xs.items() and puts it in a tuple by itself, so it can be added together with the other tuples, and be yielded as a tuple, the first half of a tuple with r as the second half.

    TL;DR: (k,) makes a new tuple, with a single element k.

flatten nested Python dictionaries, compressing keys, and recuring into sub-lists with dicts

You were really close, if a value is a list, then a single line is needed to gets you to a recursive version of flatten:

items.append((new_key, map(flatten, v)))  # for python 2.x
# or
items.append((new_key, list(map(flatten, v)))) # for python 3.x

So, you simply recursively call the function on each element.

Here is how flatten then would look like:

def flatten(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = '{0}{1}{2}'.format(parent_key,sep,k) if parent_key else k
if isinstance(v, MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
elif isinstance(v, list):
# apply itself to each element of the list - that's it!
items.append((new_key, map(flatten, v)))
else:
items.append((new_key, v))
return dict(items)

This solution can cope with an arbitrary depth of lists in lists.

Flatten nested dictionary and overwrite values

Not sure how robust this is, but I guess this is what you are looking for (credit to https://stackoverflow.com/a/6027615/5417511):

import collections

d = {
'abc': 1,
'foo': 2,
'cba': {'abc': 3, 'baz': {
'foo': 4
}}
}

def flatten(d):
items = []
for k, v in d.items():
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v).items())
else:
items.append((k, v))
return dict(items)

d.update(flatten(d))
print(d)
{'abc': 3, 'foo': 4, 'cba': {'abc': 3, 'baz': {'foo': 4}}}

Flatten the data frame column of list containing nested dictionaries in a unique way shown

Try:

data = []
for col1, row in zip(df["col1"], df["col2"]):
for d in row:
for k, v in d.items():
for kk, vv in v.items():
data.append({"col1": col1, "col2": k, "col3": kk, "col4": vv})

df = pd.DataFrame(data)
print(df)

Prints:

    col1        col2     col3      col4
0 path1 sheetname1 value11 length11
1 path1 sheetname1 value12 length12
2 path1 sheetname1 value13 length13
3 path1 sheetname2 value21 length21
4 path1 sheetname2 value22 lenth22

Flatten Python Dict and only change key when not unique

I recommend using an "accumulator" dict as input parameter instead of a list. This enables efficient lookup whether the key already exists.

def flat_dict(d, acc=None, parent_key=None, sep="_"):
out = dict() if acc is None else acc
for k, v in d.items():
if type(v) is dict:
flat_dict(v, out, parent_key=str(k))
else:
if k in out:
k = parent_key + sep + str(k)
out[k] = v

return out

If all your keys are already strings, you can of course drop the str casts.

How to flatten a nested dictionary?

You want to traverse the dictionary, building the current key and accumulating the flat dictionary. For example:

def flatten(current, key, result):
if isinstance(current, dict):
for k in current:
new_key = "{0}.{1}".format(key, k) if len(key) > 0 else k
flatten(current[k], new_key, result)
else:
result[key] = current
return result

result = flatten(my_dict, '', {})

Using it:

print(flatten(_dict1, '', {}))

{'this.example.too': 3, 'example': 'fish', 'this.is.another.value': 4, 'this.is.an.example': 2}


Related Topics



Leave a reply



Submit