How to Flatten a Pandas Dataframe with Some Columns as JSON

How to flatten a pandas dataframe with some columns as json?

Here's a solution using json_normalize() again by using a custom function to get the data in the correct format understood by json_normalize function.

import ast
from pandas.io.json import json_normalize

def only_dict(d):
    '''
    Convert json string representation of dictionary to a python dict
    '''
    return ast.literal_eval(d)

def list_of_dicts(ld):
    '''
    Create a mapping of the tuples formed after 
    converting json strings of list to a python list   
    '''
    return dict([(list(d.values())[1], list(d.values())[0]) for d in ast.literal_eval(ld)])

A = json_normalize(df['columnA'].apply(only_dict).tolist()).add_prefix('columnA.')
B = json_normalize(df['columnB'].apply(list_of_dicts).tolist()).add_prefix('columnB.pos.')

Finally, join the DFs on the common index to get:

df[['id', 'name']].join([A, B])

EDIT:- As per the comment by @MartijnPieters, the recommended way of decoding the json strings would be to use json.loads() which is much faster when compared to using ast.literal_eval() if you know that the data source is JSON.

Flatten JSON Columns in Dataframe

You can use pd.json_normalize which should be more simple.

>>> df
    ID                                         PROPERTIES                                    FORMSUBMISSIONS
0  123  {'firstname': {'value': 'FAKE'}, 'lastmodified...  [{'contact-associated-by': ['FAKE'], 'conversi...

>>> df = df.explode('FORMSUBMISSIONS')  # list to dict
>>> df
    ID                                         PROPERTIES                                    FORMSUBMISSIONS
0  123  {'firstname': {'value': 'FAKE'}, 'lastmodified...  {'contact-associated-by': ['FAKE'], 'conversio...

Now you can do json_normalize on the FORMSUBMISSIONS column. To preserve the other columns, I use pd.concat

>>> df = pd.concat([df, pd.json_normalize(df['FORMSUBMISSIONS']), axis=1).drop('FORMSUBMISSIONS', axis=1)

>>> df
    ID                                         PROPERTIES contact-associated-by conversion-id form-id form-type meta-data portal-id timestamp title
0  123  {'firstname': {'value': 'FAKE'}, 'lastmodified...                [FAKE]          FAKE    FAKE      FAKE        []      FAKE      FAKE  FAKE

You can do the same thing on PROPERTIES column.

df = pd.concat([df, pd.json_normalize(df.PROPERTIES)], axis=1).drop('PROPERTIES', axis=1)

Flatten nested JSON columns in Pandas

Get values from dicts and transform each element of the list to a row with explode while index is duplicated. Then, expand the nested dict (values of your first dict) to columns. Finally, you have to join your original dataframe with the new dataframe.

>>> df

  stock       Name                                             Annual
0     x      Tesla  {'0': {'date': '2020', 'dateFormatted': '2020-...
1     y     Google  {'0': {'date': '2020', 'dateFormatted': '2020-...
2     z  Big Apple                                                 {}

data = df['Annual'].apply(lambda x: x.values()) \
                   .explode() \
                   .apply(pd.Series)

df = df.join(data).drop(columns='Annual')

Output result:

>>> df

  stock       Name  date dateFormatted  sharesMln        shares
0     x      Tesla  2020    2020-12-31  3856.2405  3.856240e+09
0     x      Tesla  2019    2019-12-31  3856.2405  3.856240e+09
1     y     Google  2020    2020-12-31  2526.4506  2.526451e+09
1     y     Google  2019    2019-12-31  2526.4506  2.526451e+09
1     y     Google  2018    2018-12-31  2578.0992  2.578099e+09
2     z  Big Apple   NaN           NaN        NaN           NaN

Flatten JSON columns in a dataframe with lists

Idea is use dictionary comprehension with column flatten for i for index values, so after concat is possible join to original DataFrame:

x = '''{"sections": 
[{
    "id": "12ab", 
    "items": [
        {"id": "34cd", 
        "isValid": true, 
        "questionaire": {"title": "blah blah", "question": "Date of Purchase"}
        },
        {"id": "56ef", 
        "isValid": true, 
        "questionaire": {"title": "something useless", "question": "Date of Billing"}
        }
    ]
}],
"ignore": "yes"}'''

df = pd.DataFrame({'id':['1','2'], 'name':['xyz', 'abc'], 
                    'location':['new york', 'wien'], 'flatten':[x,x]})

#create default RangeIndex
df = df.reset_index(drop=True)

d = {i: pd.json_normalize(json.loads(x)['sections'],
                          'items', ['id'], 
                          record_prefix='child_')[['id','child_id','child_questionaire.question']]
                             .rename(columns={'child_questionaire.question':'question'})
     for  i, x in df.pop('flatten').items()}

df_norm = df.rename(columns={'id':'Masterid'}).join(pd.concat(d).reset_index(level=1, drop=True))

print (df_norm)
  Masterid name  location    id child_id          question
0        1  xyz  new york  12ab     34cd  Date of Purchase
0        1  xyz  new york  12ab     56ef   Date of Billing
1        2  abc      wien  12ab     34cd  Date of Purchase
1        2  abc      wien  12ab     56ef   Date of Billing

How to Flatten a Pandas Dataframe with Some Columns as JSON