Flatten Nested Dictionaries, Compressing Keys

Unnest dictionary with nested Dictionary or list of dictionaries

Given that:

you don't care that both {'a': {'b': 1, 'c': 2}} and {'a': [{'b': 1}, {'c': 2}]} will map to the same {'a.b': 1, 'a.c': 2}
there won't be anything other than dictionaries in the lists in the input data structure

This seems like a fairly clean solution:

def _compound_key_value(xs, prefix):
    if isinstance(xs, list):
        for x in xs:
            yield from _compound_key_value(x, prefix)
    elif isinstance(xs, dict):
        for k, v in xs.items():
            for p, r in _compound_key_value(v, prefix):
                yield prefix + (k,) + p, r
    else:
        yield prefix, xs


def flatten_dict_list(dl):
    return {'.'.join(k): v for k, v in _compound_key_value(dl, ())}


print(flatten_dict_list({'a': 1, 'c': [{'a': 2, 'b': {'x': 5, 'y' : 10}}, {'test': 9999}, [{'s': 2}, {'t': 100}]], 'd': [1, 2, 3]}))

Output

{'a': 1, 'c.a': 2, 'c.b.x': 5, 'c.b.y': 10, 'c.test': 9999, 'c.s': 2, 'c.t': 100, 'd': 3}

Note that it is recursive, like the solution you started out with, so I also assumed maximum recursion depth would not become an issue.

A few follow-up questions from the comments:

Difference between yield and yield from?

yield <value> directly makes a generator yield that value. As you've probably read by now, when a value is requested from it, a generator runs until it has a value to yield, which it then yields and pauses, and continues running when the next value is needed, etc. yield from <some other generator> makes the generator request a value from another generator and it then immediately yields that value and it keeps yielding from it (one at a time) until there is nothing left, and only then continues with the rest of the code, to the next yield.

In the solution, _compound_key_value(x, prefix) starts a new generator recursively, which will start yielding values, which are yielded one by one using yield from. If the code had been yield _compound_key_value(x, prefix) (without the from), it would have yielded the generator itself, instead of values from it - that can be useful as well, but not here.

The same could be achieved with for a in _compound_key_value(x, prefix): yield a, except that would be slower, because yield from has the new generator yield directly to the caller of this generator, without intermediate steps; and it is easier to read.

TL;DR: yield x yields x, yield from x only works if x is a generator itself and yields everything from x one at a time.
Why are you passing an empty tuple as prefix?

To avoid having to check if prefix has some value at all, it needs an initial value and I chose to pass the empty tuple instead of setting the default to the empty tuple in the signature like this:
def _compound_key_value(xs, prefix=()):

Either works, but I felt no default looked cleaner and since _compound_key_value is an internal function, not intended for direct use outside functions like flatten_dict_list, the requirement to pass an empty tuple when calling it seemed reasonable.

TL;DR: as a default, prefix=() would have also worked.
What is this doing? (k, )

It is part of one statement: yield prefix + (k,) + p, r this yields a tuple of two values, the first being prefix + (k,) + p. prefix is the function parameter which expects a tuple, and p is also a tuple, since it's the first half of the tuple returned by the recursive call. If you add three tuples together, the result is a new tuple with all the parts combined in order, so (k,) takes a key as obtained from xs.items() and puts it in a tuple by itself, so it can be added together with the other tuples, and be yielded as a tuple, the first half of a tuple with r as the second half.

TL;DR: (k,) makes a new tuple, with a single element k.

flatten nested Python dictionaries, compressing keys, and recuring into sub-lists with dicts

You were really close, if a value is a list, then a single line is needed to gets you to a recursive version of flatten:

items.append((new_key, map(flatten, v)))  # for python 2.x
# or
items.append((new_key, list(map(flatten, v))))  # for python 3.x

So, you simply recursively call the function on each element.

Here is how flatten then would look like:

def flatten(d, parent_key='', sep='_'):
    items = []
    for k, v in d.items():
        new_key = '{0}{1}{2}'.format(parent_key,sep,k) if parent_key else k
        if isinstance(v, MutableMapping):
            items.extend(flatten(v, new_key, sep=sep).items())
        elif isinstance(v, list):
            # apply itself to each element of the list - that's it!
            items.append((new_key, map(flatten, v)))
        else:
            items.append((new_key, v))
    return dict(items)

This solution can cope with an arbitrary depth of lists in lists.

Flatten nested dictionary and overwrite values

Not sure how robust this is, but I guess this is what you are looking for (credit to https://stackoverflow.com/a/6027615/5417511):

import collections

d = {
    'abc': 1,
    'foo': 2,
    'cba': {'abc': 3, 'baz': {
        'foo': 4
    }}
}

def flatten(d):
    items = []
    for k, v in d.items():
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten(v).items())
        else:
            items.append((k, v))
    return dict(items)

d.update(flatten(d))
print(d)
{'abc': 3, 'foo': 4, 'cba': {'abc': 3, 'baz': {'foo': 4}}}

Flatten the data frame column of list containing nested dictionaries in a unique way shown

Try:

data = []
for col1, row in zip(df["col1"], df["col2"]):
    for d in row:
        for k, v in d.items():
            for kk, vv in v.items():
                data.append({"col1": col1, "col2": k, "col3": kk, "col4": vv})

df = pd.DataFrame(data)
print(df)

Prints:

    col1        col2     col3      col4
0  path1  sheetname1  value11  length11
1  path1  sheetname1  value12  length12
2  path1  sheetname1  value13  length13
3  path1  sheetname2  value21  length21
4  path1  sheetname2  value22   lenth22

Flatten Python Dict and only change key when not unique

I recommend using an "accumulator" dict as input parameter instead of a list. This enables efficient lookup whether the key already exists.

def flat_dict(d, acc=None, parent_key=None, sep="_"):
    out = dict() if acc is None else acc
    for k, v in d.items():
        if type(v) is dict:
            flat_dict(v, out, parent_key=str(k))
        else:
            if k in out:
                k = parent_key + sep + str(k)
            out[k] = v

    return out

If all your keys are already strings, you can of course drop the str casts.

How to flatten a nested dictionary?

You want to traverse the dictionary, building the current key and accumulating the flat dictionary. For example:

def flatten(current, key, result):
    if isinstance(current, dict):
        for k in current:
            new_key = "{0}.{1}".format(key, k) if len(key) > 0 else k
            flatten(current[k], new_key, result)
    else:
        result[key] = current
    return result

result = flatten(my_dict, '', {})

Using it:

print(flatten(_dict1, '', {}))

{'this.example.too': 3, 'example': 'fish', 'this.is.another.value': 4, 'this.is.an.example': 2}