Unnest dictionary with nested Dictionary or list of dictionaries
Given that:
- you don't care that both
{'a': {'b': 1, 'c': 2}}
and{'a': [{'b': 1}, {'c': 2}]}
will map to the same{'a.b': 1, 'a.c': 2}
- there won't be anything other than dictionaries in the lists in the input data structure
This seems like a fairly clean solution:
def _compound_key_value(xs, prefix):
if isinstance(xs, list):
for x in xs:
yield from _compound_key_value(x, prefix)
elif isinstance(xs, dict):
for k, v in xs.items():
for p, r in _compound_key_value(v, prefix):
yield prefix + (k,) + p, r
else:
yield prefix, xs
def flatten_dict_list(dl):
return {'.'.join(k): v for k, v in _compound_key_value(dl, ())}
print(flatten_dict_list({'a': 1, 'c': [{'a': 2, 'b': {'x': 5, 'y' : 10}}, {'test': 9999}, [{'s': 2}, {'t': 100}]], 'd': [1, 2, 3]}))
Output
{'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c.s': 2, 'c.t': 100, 'd': 3}
Note that it is recursive, like the solution you started out with, so I also assumed maximum recursion depth would not become an issue.
A few follow-up questions from the comments:
Difference between
yield
andyield from
?yield <value>
directly makes a generator yield that value. As you've probably read by now, when a value is requested from it, a generator runs until it has a value to yield, which it then yields and pauses, and continues running when the next value is needed, etc.yield from <some other generator>
makes the generator request a value from another generator and it then immediately yields that value and it keeps yielding from it (one at a time) until there is nothing left, and only then continues with the rest of the code, to the nextyield
.
In the solution,_compound_key_value(x, prefix)
starts a new generator recursively, which will start yielding values, which are yielded one by one usingyield from
. If the code had beenyield _compound_key_value(x, prefix)
(without thefrom
), it would have yielded the generator itself, instead of values from it - that can be useful as well, but not here.
The same could be achieved withfor a in _compound_key_value(x, prefix): yield a
, except that would be slower, becauseyield from
has the new generator yield directly to the caller of this generator, without intermediate steps; and it is easier to read.
TL;DR:yield x
yieldsx
,yield from x
only works ifx
is a generator itself and yields everything fromx
one at a time.Why are you passing an empty tuple as prefix?
To avoid having to check if prefix has some value at all, it needs an initial value and I chose to pass the empty tuple instead of setting the default to the empty tuple in the signature like this:def _compound_key_value(xs, prefix=()):
Either works, but I felt no default looked cleaner and since_compound_key_value
is an internal function, not intended for direct use outside functions likeflatten_dict_list
, the requirement to pass an empty tuple when calling it seemed reasonable.
TL;DR: as a default,prefix=()
would have also worked.What is this doing?
(k, )
It is part of one statement:yield prefix + (k,) + p, r
this yields a tuple of two values, the first beingprefix + (k,) + p
.prefix
is the function parameter which expects a tuple, andp
is also a tuple, since it's the first half of the tuple returned by the recursive call. If you add three tuples together, the result is a new tuple with all the parts combined in order, so(k,)
takes a key as obtained fromxs.items()
and puts it in a tuple by itself, so it can be added together with the other tuples, and be yielded as a tuple, the first half of a tuple withr
as the second half.
TL;DR:(k,)
makes a new tuple, with a single elementk
.
flatten nested Python dictionaries, compressing keys, and recuring into sub-lists with dicts
You were really close, if a value is a list, then a single line is needed to gets you to a recursive version of flatten
:
items.append((new_key, map(flatten, v))) # for python 2.x
# or
items.append((new_key, list(map(flatten, v)))) # for python 3.x
So, you simply recursively call the function on each element.
Here is how flatten
then would look like:
def flatten(d, parent_key='', sep='_'):
items = []
for k, v in d.items():
new_key = '{0}{1}{2}'.format(parent_key,sep,k) if parent_key else k
if isinstance(v, MutableMapping):
items.extend(flatten(v, new_key, sep=sep).items())
elif isinstance(v, list):
# apply itself to each element of the list - that's it!
items.append((new_key, map(flatten, v)))
else:
items.append((new_key, v))
return dict(items)
This solution can cope with an arbitrary depth of lists in lists.
Flatten nested dictionary and overwrite values
Not sure how robust this is, but I guess this is what you are looking for (credit to https://stackoverflow.com/a/6027615/5417511):
import collections
d = {
'abc': 1,
'foo': 2,
'cba': {'abc': 3, 'baz': {
'foo': 4
}}
}
def flatten(d):
items = []
for k, v in d.items():
if isinstance(v, collections.MutableMapping):
items.extend(flatten(v).items())
else:
items.append((k, v))
return dict(items)
d.update(flatten(d))
print(d)
{'abc': 3, 'foo': 4, 'cba': {'abc': 3, 'baz': {'foo': 4}}}
Flatten the data frame column of list containing nested dictionaries in a unique way shown
Try:
data = []
for col1, row in zip(df["col1"], df["col2"]):
for d in row:
for k, v in d.items():
for kk, vv in v.items():
data.append({"col1": col1, "col2": k, "col3": kk, "col4": vv})
df = pd.DataFrame(data)
print(df)
Prints:
col1 col2 col3 col4
0 path1 sheetname1 value11 length11
1 path1 sheetname1 value12 length12
2 path1 sheetname1 value13 length13
3 path1 sheetname2 value21 length21
4 path1 sheetname2 value22 lenth22
Flatten Python Dict and only change key when not unique
I recommend using an "accumulator" dict as input parameter instead of a list. This enables efficient lookup whether the key already exists.
def flat_dict(d, acc=None, parent_key=None, sep="_"):
out = dict() if acc is None else acc
for k, v in d.items():
if type(v) is dict:
flat_dict(v, out, parent_key=str(k))
else:
if k in out:
k = parent_key + sep + str(k)
out[k] = v
return out
If all your keys are already strings, you can of course drop the str
casts.
How to flatten a nested dictionary?
You want to traverse the dictionary, building the current key and accumulating the flat dictionary. For example:
def flatten(current, key, result):
if isinstance(current, dict):
for k in current:
new_key = "{0}.{1}".format(key, k) if len(key) > 0 else k
flatten(current[k], new_key, result)
else:
result[key] = current
return result
result = flatten(my_dict, '', {})
Using it:
print(flatten(_dict1, '', {}))
{'this.example.too': 3, 'example': 'fish', 'this.is.another.value': 4, 'this.is.an.example': 2}
Related Topics
How to Improve Performance of This Code
Installing Pip Is Not Working in Python ≪ 3.6
How to Create a New Column from the Output of Pandas Groupby().Sum()
Prevent Scientific Notation in Matplotlib.Pyplot
How to Get a Substring of a String in Python
Identify Groups of Continuous Numbers in a List
Count the Frequency That a Value Occurs in a Dataframe Column
Deleting Dataframe Row in Pandas Based on Column Value
How to Open a Chrome Profile Through Python
How to Install Pip on Macos or Os X
Best Way to Return Multiple Values from a Function
How to Write Json Data to a File
How to Check If a List Is Empty
Importerror: No Module Named 'Pygame'
Should I Put #! (Shebang) in Python Scripts, and What Form Should It Take
How to Count the Frequency of the Elements in an Unordered List