Split/Explode a Column of Dictionaries into Separate Columns With Pandas

How to split a pandas column with a list of dicts into separate columns for each key

The columns are lists of dicts.
- Each dict in the list can be moved to a separate column by using pandas.explode().
- Convert the column of dicts to a dataframe where the keys are column headers and the values are observations, by using pandas.json_normalize(), .join() this back to df.
Use .drop() to remove the unneeded column.
If the column contains list of dicts that are strings (e.g. "[{key: value}]"), refer to this solution in Splitting dictionary/list inside a Pandas Column into Separate Columns, and use:
- df.col2 = df.col2.apply(literal_eval), with from ast import literal_eval.

import pandas as pd

# create sample dataframe
df = pd.DataFrame({'col1': ['x', 'y'], 'col2': [[{"target": "NAge", "segment": "21 and older"}, {"target": "MinAge", "segment": "21"}, {"target": "Retargeting", "segment": "people who may be similar to their customers"}, {"target": "Region", "segment": "the United States"}], [{"target": "NAge", "segment": "18 and older"}, {"target": "Location Type", "segment": "HOME"}, {"target": "Interest", "segment": "Hispanic culture"}, {"target": "Interest", "segment": "Republican Party (United States)"}, {"target": "Location Granularity", "segment": "country"}, {"target": "Country", "segment": "the United States"}, {"target": "MinAge", "segment": 18}]]})

# display(df)
  col1                                                                                                                                                                                                                                                                                                                                                                                 col2
0    x                                                                                                                                                   [{'target': 'NAge', 'segment': '21 and older'}, {'target': 'MinAge', 'segment': '21'}, {'target': 'Retargeting', 'segment': 'people who may be similar to their customers'}, {'target': 'Region', 'segment': 'the United States'}]
1    y  [{'target': 'NAge', 'segment': '18 and older'}, {'target': 'Location Type', 'segment': 'HOME'}, {'target': 'Interest', 'segment': 'Hispanic culture'}, {'target': 'Interest', 'segment': 'Republican Party (United States)'}, {'target': 'Location Granularity', 'segment': 'country'}, {'target': 'Country', 'segment': 'the United States'}, {'target': 'MinAge', 'segment': 18}]

# use explode to give each dict in a list a separate row
df = df.explode('col2').reset_index(drop=True)

# normalize the column of dicts, join back to the remaining dataframe columns, and drop the unneeded column
df = df.join(pd.json_normalize(df.col2)).drop(columns=['col2'])

`display(df)`

   col1                target                                       segment
0     x                  NAge                                  21 and older
1     x                MinAge                                            21
2     x           Retargeting  people who may be similar to their customers
3     x                Region                             the United States
4     y                  NAge                                  18 and older
5     y         Location Type                                          HOME
6     y              Interest                              Hispanic culture
7     y              Interest              Republican Party (United States)
8     y  Location Granularity                                       country
9     y               Country                             the United States
10    y                MinAge                                            18

Get count

If the goal is to get the count for each 'target' and associated 'segment'

counts = df.groupby(['target', 'segment']).count()

Updated

This update is implemented for the full file

import pandas as pd
from ast import literal_eval

# load the file
df = pd.read_csv('en-US.csv')

# replace NaNs with '[]', otherwise literal_eval will error
df.targets = df.targets.fillna('[]')

# replace null with None, otherwise literal_eval will error
df.targets = df.targets.str.replace('null', 'None')

# convert the strings to lists of dicts
df.targets = df.targets.apply(literal_eval)

# use explode to give each dict in a list a separate row
df = df.explode('targets').reset_index(drop=True)

# fillna with {} is required for json_normalize
df.targets = df.targets.fillna({i: {} for i in df.index})

# normalize the column of dicts, join back to the remaining dataframe columns, and drop the unneeded column
normalized = pd.json_normalize(df.targets)

# get the counts
counts = normalized.groupby(['target', 'segment']).segment.count().reset_index(name='counts')

Split dictionary into individual columns in a df

I think better is use:

df = pd.DataFrame(df['tests'].values.tolist(), index=df.index)
print (df)
     Mon  Tues  Wed
SO4    6     6    7
CH3    0     8   10

But if really need it (but dicts are by design not sortable, so maybe get different output):

df = df['tests'].astype(str).str.strip('{}').str.split(', ', expand=True)
print (df)
            0          1          2
SO4  'Mon': 6   'Wed': 7  'Tues': 6
CH3  'Mon': 0  'Wed': 10  'Tues': 8

Convert a dataframe column of dictionaries with lists into separate columns with pandas

Try with apply and explode:

df['price'] = [[i for i in d.keys() for x in d[i]] if isinstance(d, dict) else [d] for d in df['price'].tolist()]
df = df.set_index('item_id').apply(pd.Series.explode, axis=0).reset_index()
print(df)

And now:

print(df)

Would give:

   item_id shop_id price
0        1      S1    10
1        1      S2    10
2        1      S3    20
3        1      S4    30
4        2      S2    50
5        3      S3   NaN
6        4      S1    10
7        4      S2    10
8        4      S3    10
9        4      S4    25

Convert a list of nested dictionary WITH STRING OBJECT into pandas Dataframe

If I get it right, this should work:

import json
import requests
import pandas as pd

req = requests.get('https://office.ieltsvietop.vn/api/get_data/history')
req_json = req.json()

df = pd.DataFrame(json.loads(r['history_value']) for r in req_json)

this df should be like

     request_id ketoan_id lop_id  ... danhsachcho chinhanh_old chinhanh
0            11      2470    551  ...         NaN          NaN      NaN
1            13      2474    551  ...         NaN          NaN      NaN
2            12      2468    564  ...         NaN          NaN      NaN
3            15      2338    442  ...         NaN          NaN      NaN
4            31      2463    239  ...         NaN          NaN      NaN
...         ...       ...    ...  ...         ...          ...      ...
5256       4699      4357    NaN  ...         NaN          NaN      NaN
5257       4695      3787    NaN  ...         NaN          NaN      NaN
5258       4679      4716    NaN  ...         NaN          NaN      NaN
5259       4694      4114    596  ...         NaN          NaN      NaN
5260       4705      4839    601  ...         NaN          NaN      NaN

[5261 rows x 20 columns]

then we select the needed columns ngaybaoluu, ngayhoclai and lydo with

df = df[['ngaybaoluu', 'ngayhoclai', 'lydo']]

the final df is

      ngaybaoluu  ngayhoclai                                               lydo
0            NaN         NaN                   Bạn phù hợp với trình độ của lớp
1            NaN         NaN  Bạn cần lấy target để ra trường và phục vụ côn...
2            NaN         NaN              Học viên đăng kí học Speaking-express
3            NaN         NaN            Vt3 có lớp phù hợp với trình độ của bạn
4            NaN         NaN                                                NaN
...          ...         ...                                                ...
5256  22-06-2022  01-08-2022  Học viên tập trung ôn thi THPTQG. Học viên đã ...
5257  21-06-2022  21-08-2022  Học viên chưa sắp xếp được lịch học lại . Học ...
5258  21-06-2022  15-07-2022  Học viên đi  tập quân sự. Học viên đã hiểu rõ ...
5259         NaN  22-06-2022                                                NaN
5260         NaN         NaN                                                NaN

[5261 rows x 3 columns]

Be aware that many of the columns have null values in them, which means the original response of the url does not contain these fields, so it's fine. If you want to fill these null values, you can look up to .fillna().

Explode nested list of dictionaries into Pandas columns

import re

d_new = (pd.DataFrame([[re.sub(".*[*]\\W+", "", val['text']['text']) 
               for val in dat['blocks']] for dat in raw_data_2]).
          drop([0, 5], axis = 1))

d_new.columns = ['heard_by', 'direction','destination', 'new_customer']

d_new
 
        heard_by direction destination new_customer
0         Friend     North    New York          Yes
1  Online Search     North       Miami           No

You can then append this to your original data

Convert a list of nested dictionary WITH STRING OBJECT into pandas Dataframe

If I get it right, this should work:

import json
import requests
import pandas as pd

req = requests.get('https://office.ieltsvietop.vn/api/get_data/history')
req_json = req.json()

df = pd.DataFrame(json.loads(r['history_value']) for r in req_json)

this df should be like

     request_id ketoan_id lop_id  ... danhsachcho chinhanh_old chinhanh
0            11      2470    551  ...         NaN          NaN      NaN
1            13      2474    551  ...         NaN          NaN      NaN
2            12      2468    564  ...         NaN          NaN      NaN
3            15      2338    442  ...         NaN          NaN      NaN
4            31      2463    239  ...         NaN          NaN      NaN
...         ...       ...    ...  ...         ...          ...      ...
5256       4699      4357    NaN  ...         NaN          NaN      NaN
5257       4695      3787    NaN  ...         NaN          NaN      NaN
5258       4679      4716    NaN  ...         NaN          NaN      NaN
5259       4694      4114    596  ...         NaN          NaN      NaN
5260       4705      4839    601  ...         NaN          NaN      NaN

[5261 rows x 20 columns]

then we select the needed columns ngaybaoluu, ngayhoclai and lydo with

df = df[['ngaybaoluu', 'ngayhoclai', 'lydo']]

the final df is

      ngaybaoluu  ngayhoclai                                               lydo
0            NaN         NaN                   Bạn phù hợp với trình độ của lớp
1            NaN         NaN  Bạn cần lấy target để ra trường và phục vụ côn...
2            NaN         NaN              Học viên đăng kí học Speaking-express
3            NaN         NaN            Vt3 có lớp phù hợp với trình độ của bạn
4            NaN         NaN                                                NaN
...          ...         ...                                                ...
5256  22-06-2022  01-08-2022  Học viên tập trung ôn thi THPTQG. Học viên đã ...
5257  21-06-2022  21-08-2022  Học viên chưa sắp xếp được lịch học lại . Học ...
5258  21-06-2022  15-07-2022  Học viên đi  tập quân sự. Học viên đã hiểu rõ ...
5259         NaN  22-06-2022                                                NaN
5260         NaN         NaN                                                NaN

[5261 rows x 3 columns]

How to split multiple dictionaries in row into new rows using Pandas

You can use explode:

tmp = df.explode('Rules').reset_index(drop=True)
df = pd.concat([tmp, pd.json_normalize(tmp['Rules'])], axis=1).drop('Rules', axis=1)

Output:

>>> df
   SetID     SetName RulesID  RuleName
0      0  Standard_1      10  name_abc
1      0  Standard_1      11  name_xyz
2      1  Standard_2      12  name_arg

One-liner version of the above:

df.explode('Rules').reset_index(drop=True).pipe(lambda x: pd.concat([tmp, pd.json_normalize(tmp['Rules'])], axis=1)).drop('Rules', axis=1)

Split/Explode a Column of Dictionaries into Separate Columns With Pandas

How to split a pandas column with a list of dicts into separate columns for each key

`display(df)`

Split dictionary into individual columns in a df

Convert a dataframe column of dictionaries with lists into separate columns with pandas

Convert a list of nested dictionary WITH STRING OBJECT into pandas Dataframe

Explode nested list of dictionaries into Pandas columns

Convert a list of nested dictionary WITH STRING OBJECT into pandas Dataframe

How to split multiple dictionaries in row into new rows using Pandas

Related Topics

Leave a reply