How to Flatten a Nested JSON Recursively, with Flatten_JSON

How to flatten a nested JSON recursively, with flatten_json

How to flatten a `JSON` or `dict` is a common question, to which there are many answers.

This answer focuses on using flatten_json to recursively flatten a nested dict or JSON.

Assumptions:

This answer assumes you already have the JSON or dict loaded into some variable (e.g. file, api, etc.)
- In this case we will use data

How is `data` loaded into `flatten_json`:

It accepts a dict, as shown by the function type hint.

The most common forms of `data`:

Just a dict: {}
- flatten_json(data)
List of dicts: [{}, {}, {}]
- [flatten_json(x) for x in data]
JSON with with top level keys, where the values repeat: {1: {}, 2: {}, 3: {}}
- [flatten_json(data[key]) for key in data]
Other
- {'key': [{}, {}, {}]}: [flatten_json(x) for x in data['key']]

Practical Examples:

I typically flatten data into a pandas.DataFrame for further analysis.
- Load pandas with import pandas as pd
flatten_json returns a dict, which can be saved directly using the csv packages.

Data 1:

{
    "id": 1,
    "class": "c1",
    "owner": "myself",
    "metadata": {
        "m1": {
            "value": "m1_1",
            "timestamp": "d1"
        },
        "m2": {
            "value": "m1_2",
            "timestamp": "d2"
        },
        "m3": {
            "value": "m1_3",
            "timestamp": "d3"
        },
        "m4": {
            "value": "m1_4",
            "timestamp": "d4"
        }
    },
    "a1": {
        "a11": [

        ]
    },
    "m1": {},
    "comm1": "COMM1",
    "comm2": "COMM21529089656387",
    "share": "xxx",
    "share1": "yyy",
    "hub1": "h1",
    "hub2": "h2",
    "context": [

    ]
}

Flatten 1:

df = pd.DataFrame([flatten_json(data)])

 id class   owner metadata_m1_value metadata_m1_timestamp metadata_m2_value metadata_m2_timestamp metadata_m3_value metadata_m3_timestamp metadata_m4_value metadata_m4_timestamp  comm1               comm2 share share1 hub1 hub2
  1    c1  myself              m1_1                    d1              m1_2                    d2              m1_3                    d3              m1_4                    d4  COMM1  COMM21529089656387   xxx    yyy   h1   h2

Data 2:

[{
        'accuracy': 17,
        'activity': [{
                'activity': [{
                        'confidence': 100,
                        'type': 'STILL'
                    }
                ],
                'timestampMs': '1542652'
            }
        ],
        'altitude': -10,
        'latitudeE7': 3777321,
        'longitudeE7': -122423125,
        'timestampMs': '1542654',
        'verticalAccuracy': 2
    }, {
        'accuracy': 17,
        'activity': [{
                'activity': [{
                        'confidence': 100,
                        'type': 'STILL'
                    }
                ],
                'timestampMs': '1542652'
            }
        ],
        'altitude': -10,
        'latitudeE7': 3777321,
        'longitudeE7': -122423125,
        'timestampMs': '1542654',
        'verticalAccuracy': 2
    }, {
        'accuracy': 17,
        'activity': [{
                'activity': [{
                        'confidence': 100,
                        'type': 'STILL'
                    }
                ],
                'timestampMs': '1542652'
            }
        ],
        'altitude': -10,
        'latitudeE7': 3777321,
        'longitudeE7': -122423125,
        'timestampMs': '1542654',
        'verticalAccuracy': 2
    }
]

Flatten 2:

df = pd.DataFrame([flatten_json(x) for x in data])

 accuracy  activity_0_activity_0_confidence activity_0_activity_0_type activity_0_timestampMs  altitude  latitudeE7  longitudeE7 timestampMs  verticalAccuracy
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2
       17                               100                      STILL                1542652       -10     3777321   -122423125     1542654                 2

Data 3:

{
    "1": {
        "VENUE": "JOEBURG",
        "COUNTRY": "HAE",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    },
    "2": {
        "VENUE": "FOOBURG",
        "COUNTRY": "ABA",
        "ITW": "XAD",
        "RACES": {
            "1": {
                "NO": 1,
                "TIME": "12:35"
            },
            "2": {
                "NO": 2,
                "TIME": "13:10"
            },
            "3": {
                "NO": 3,
                "TIME": "13:40"
            },
            "4": {
                "NO": 4,
                "TIME": "14:10"
            },
            "5": {
                "NO": 5,
                "TIME": "14:55"
            },
            "6": {
                "NO": 6,
                "TIME": "15:30"
            },
            "7": {
                "NO": 7,
                "TIME": "16:05"
            },
            "8": {
                "NO": 8,
                "TIME": "16:40"
            }
        }
    }
}

Flatten 3:

df = pd.DataFrame([flatten_json(data[key]) for key in data])

   VENUE COUNTRY  ITW  RACES_1_NO RACES_1_TIME  RACES_2_NO RACES_2_TIME  RACES_3_NO RACES_3_TIME  RACES_4_NO RACES_4_TIME  RACES_5_NO RACES_5_TIME  RACES_6_NO RACES_6_TIME  RACES_7_NO RACES_7_TIME  RACES_8_NO RACES_8_TIME
 JOEBURG     HAE  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40
 FOOBURG     ABA  XAD           1        12:35           2        13:10           3        13:40           4        14:10           5        14:55           6        15:30           7        16:05           8        16:40

Other Examples:

Python Pandas - Flatten Nested JSON
handling nested json in pandas
How to flatten a nested JSON from the NASA Weather Insight API in Python

Flatten a nested JSON?

flatten_json is a library now, so you can do this. It'll give you 160 columns

from flatten_json import flatten
dic_flattened = (flatten(d, '.') for d in test_json['result'])
df = pd.DataFrame(dic_flattened)

df.shape
(5, 160)

How to flatten multilevel/nested JSON?

I used the following function (details can be found here):

def flatten_data(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

This unfortunately completely flattens whole JSON, meaning that if you have multi-level JSON (many nested dictionaries), it might flatten everything into single line with tons of columns.

What I used, in the end, was json_normalize() and specified structure that I required. A nice example of how to do it that way can be found here.

flatten_json recursive flattening function for lists

I solved it using recursion, here's my code:

import json
import pandas as pd
import flatten_json as fj

keys = {'data', 'level1', 'level2', 'level3'}
with open('test_lh.json') as f:
    data = json.load(f)

levels = ['data.level1.level2.level3', 'data.level1.level2', 'data.level1', 'data']
recs_dict = {}

def do_step(data_dict, level, depth, path):
    recs = []
    for x in data_dict[level]:
        if depth < len(path.split('.'))-1:
            do_step(x, path.split('.')[depth+1], depth+1, path)
        else:
            dic = fj.flatten(x, root_keys_to_ignore=keys)
            recs.append(dic)
    recs_dict[level] = recs

for path in levels:
    do_step(data, path.split('.')[0], 0, path)

for key, value in recs_dict.items():
    print(key)
    df = pd.DataFrame(recs_dict[key])
    print(df)

And here's the output:

level3
  identifiers_0_type identifiers_0_scheme identifiers_0_value identifiers_1_type identifiers_1_scheme identifiers_1_value name    type
0                abc                  def                 123                abc                  def                 123  abs  level3
1                abc                  def                 123                abc                  def                 123  abs  level3
level2
  identifiers_0_type identifiers_0_scheme identifiers_0_value identifiers_1_type identifiers_1_scheme identifiers_1_value name    type
0                abc                  def                 123                abc                  def                 123  abs  level2
1                abc                  def                 123                abc                  def                 123  abs     abd
level1
  identifiers_0_type identifiers_0_scheme identifiers_0_value identifiers_1_type identifiers_1_scheme identifiers_1_value name    type
0                abc                  def                 123                abc                  def                 123  asd  level1
data
  identifiers_0_type identifiers_0_scheme identifiers_0_value identifiers_1_type identifiers_1_scheme identifiers_1_value  name type
0                abc                  def                 123                abc                  def                 123  qwer  abd

Flatten Nested JSON in Python

The error you got indicates you missed that some of your values are actually a dictionary within an array.

Assuming you want to flatten your json file to retrieve the following keys: mediaType, queueId, count.

These can be retrieved by the following sample code:

import json
with open(path_to_json_file, 'r') as f:
    json_dict = json.load(f)

for result in json_dict.get("results"):
    media_type = result.get("group").get("mediaType")
    queue_id = result.get("group").get("queueId")
    n_offered = result.get("data")[0].get("metrics")[0].get("count")

If your data and metrics keys will have multiple indices you will have to use a for loop to retrieve every count value accordingly.

Flatten a tripled nested json into a dataframe

It is similar to what you have in Edit, but perhaps slightly shorter syntax and more performant.

If you have NaN in the DataFrame, older version of Pandas could fail on json_normalize.

This solution should work with Pandas 1.3+.

df = pd.json_normalize(products)
df = df.explode('properties.features')
df = pd.concat([df.drop('properties.features', axis=1).reset_index(drop=True),
                pd.json_normalize(df['properties.features']).add_prefix('properties.features.')], axis=1)
df = df.explode('properties.features.features')
df = pd.concat([df.drop('properties.features.features', axis=1).reset_index(drop=True),
                pd.json_normalize(df['properties.features.features']).add_prefix('properties.features.features.')], axis=1)

Perf. with 1000 products.

Code in Edit : 4.85 s ± 218 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This solution: 58.3 ms ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Fastest way to flatten / un-flatten nested JavaScript objects

Here's my much shorter implementation:

Object.unflatten = function(data) {
    "use strict";
    if (Object(data) !== data || Array.isArray(data))
        return data;
    var regex = /\.?([^.\[\]]+)|\[(\d+)\]/g,
        resultholder = {};
    for (var p in data) {
        var cur = resultholder,
            prop = "",
            m;
        while (m = regex.exec(p)) {
            cur = cur[prop] || (cur[prop] = (m[2] ? [] : {}));
            prop = m[2] || m[1];
        }
        cur[prop] = data[p];
    }
    return resultholder[""] || resultholder;
};

flatten hasn't changed much (and I'm not sure whether you really need those isEmpty cases):

Object.flatten = function(data) {
    var result = {};
    function recurse (cur, prop) {
        if (Object(cur) !== cur) {
            result[prop] = cur;
        } else if (Array.isArray(cur)) {
             for(var i=0, l=cur.length; i<l; i++)
                 recurse(cur[i], prop + "[" + i + "]");
            if (l == 0)
                result[prop] = [];
        } else {
            var isEmpty = true;
            for (var p in cur) {
                isEmpty = false;
                recurse(cur[p], prop ? prop+"."+p : p);
            }
            if (isEmpty && prop)
                result[prop] = {};
        }
    }
    recurse(data, "");
    return result;
}

Together, they run your benchmark in about the half of the time (Opera 12.16: ~900ms instead of ~ 1900ms, Chrome 29: ~800ms instead of ~1600ms).

Note: This and most other solutions answered here focus on speed and are susceptible to prototype pollution and shold not be used on untrusted objects.

How to Flatten a Nested JSON Recursively, with Flatten_JSON