Convert Pandas Dataframe to a Nested Dict

Construct pandas DataFrame from items in nested dictionary

A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict, using the option orient='index':

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
'Category 2': {'att_1': 23, 'att_2': 'another'}},
15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

pd.DataFrame.from_dict({(i,j): user_dict[i][j]
for i in user_dict.keys()
for j in user_dict[i].keys()},
orient='index')

att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar

An alternative approach would be to build your dataframe up by concatenating the component dataframes:

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
user_ids.append(user_id)
frames.append(pd.DataFrame.from_dict(d, orient='index'))

pd.concat(frames, keys=user_ids)

att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar

Convert pandas DataFrame to a nested dict

I don't understand why there isn't a B2 in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:

def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
return d

which produces

>>> df
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
>>> pprint.pprint(recur_dictify(df))
{'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},
'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},
'C': {'C1': {'C11': 4}}}

It might be simpler to use a non-pandas approach, though:

def retro_dictify(frame):
d = {}
for row in frame.values:
here = d
for elem in row[:-2]:
if elem not in here:
here[elem] = {}
here = here[elem]
here[row[-2]] = row[-1]
return d

How to convert pandas dataframe to nested dictionary

I think you were very close.

Use groupby and to_dict:

df = df.groupby('Name')[['Chain','Food','Healthy']]
.apply(lambda x: x.set_index('Chain').to_dict(orient='index'))
.to_dict()

print (df)
{'George': {'KFC': {'Healthy': False, 'Food': 'chicken'},
'McDonalds': {'Healthy': False, 'Food': 'burger'}},
'John': {'McDonalds': {'Healthy': True, 'Food': 'salad'},
'Wendys': {'Healthy': False, 'Food': 'burger'}}}

Pandas: transforming dataframe to nested dictionary

You can group your dataframe by all columns except price, then create your dictionaries in a loop:

# if more than one price for one product in a chain, then calculate mean:
grouped_df = df.groupby(['Month_Year', 'City_Name', 'Chain_Name', 'Product_Name']).agg('mean')

result = dict()
nested_dict = dict()

for index, value in grouped_df.itertuples():
for i, key in enumerate(index):
if i == 0:
if not key in result:
result[key] = {}
nested_dict = result[key]
elif i == len(index) - 1:
nested_dict[key] = value
else:
if not key in nested_dict:
nested_dict[key] = {}
nested_dict = nested_dict[key]

print(json.dumps(result, indent=4))

Changing your df to show nested dict and mean calculation to:

  Month_Year City_Name Chain_Name Product_Name  Product_Price
0 11-2021 London Aldi Pasta 2.33
1 11-2021 London Aldi Pasta 2.35
2 11-2021 London Aldi Olives 3.99
3 11-2021 Bristol Spar Bananas 1.45
4 10-2021 London Tesco Olives 4.12
5 10-2021 Cardiff Spar Pasta 2.25

You get the output:

{
"10-2021": {
"Cardiff": {
"Spar": {
"Pasta": 2.25
}
},
"London": {
"Tesco": {
"Olives": 4.12
}
}
},
"11-2021": {
"Bristol": {
"Spar": {
"Bananas": 1.45
}
},
"London": {
"Aldi": {
"Olives": 3.99,
"Pasta": 2.34
}
}
}
}

How to convert a nested dict, to a pandas dataframe

Loading a JSON/dict:

  • Using .json_normalized to expand the dict.
import pandas as pd

data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}

df = pd.json_normalize(data)

# display(df)
id data.name data.lastname data.office.num data.office.department
0 3241234 carol netflik 3543 trigy

If the dataframe has column of dicts

  • Also see this answer, to this SO: Split / Explode a column of dictionaries into separate columns with pandas
# dataframe with column of dicts
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})

# display(df)
col2 col
0 1 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
1 2 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
2 3 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}

# normalize the column of dicts
normalized = pd.json_normalize(df['col'])

# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])

# display(df)
col2 id data.name data.lastname data.office.num data.office.department
0 1 3241234 carol netflik 3543 trigy
1 2 3241234 carol netflik 3543 trigy
2 3 3241234 carol netflik 3543 trigy

If the dataframe has a column of lists with dicts

  • The dicts need to be removed from the lists with .explode
data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]

df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})

# display(df)
col2 col
0 1 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
1 2 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
2 3 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]

# explode the lists
df = df.explode('col', ignore_index=True)

# remove and normalize the column of dicts
normalized = pd.json_normalize(df.pop('col'))

# join the normalized column to df
df = df.join(normalized)

How to convert a nested dictionary with lists to a dataframe in this format

You can use stack and explode:

import pandas as pd

nested_dict = { 'Girl': {'June': [45, 32], 'Samantha': [14, 34, 65]},
'Boy': {'Brad': [12, 54, 12], 'Chad': [12]}}

df = pd.DataFrame.from_dict(nested_dict, orient='index')
print(df.stack().explode())

Output:

Girl  June        45
June 32
Samantha 14
Samantha 34
Samantha 65
Boy Brad 12
Brad 54
Brad 12
Chad 12

Pandas dataframe groups to nested dict

Assuming you actually want a nested dictionary like this (note the extra braces):

{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
... etc.
}

Here is a step-by-step approach.

First, create a dataframe with the desired (PERSON_ID, YEAR) multi-index:

frame_sorted = frame.set_index(['PERSON_ID', 'YEAR']).sort_index()
print(frame_sorted)

Output:

                      IN       OUT
PERSON_ID YEAR
76129 1951 3.775517 6.028186
1952 3.679453 1.768597
1953 3.530302 0.409578
... etc.

Then, created the nested dict using a nested dictionary comprehension:

person_ids = frame_sorted.index.levels[0]
data_dict = {person: {idx: data.to_dict() for idx, data in frame_sorted.loc[person].iterrows()}
for person in person_ids}
print(data_dict)

Output

{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
...etc.

Converting a pandas dataframe to a nested dict in Python using groupby

First group by device (level 1) and keep all columns except device then set variable as index (level 2) and finally convert all columns to dict (level 3). At the end, convert the whole dataframe as a dict.

import json

d = df.groupby("device")[["variable", "size", "manual", "method", "nrow", "ncol"]] \
.apply(lambda x: x.set_index("variable").to_dict(orient="index")) \
.to_dict()
print(json.dumps(d, indent=4, sort_keys=True))

{
"123456": {
"a": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
},
"b": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
}
},
"7891011": {
"a": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
},
"b": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
}
}
}

Flattened and Convert list of Nested Dictionary to Pandas Dataframe

Lets flatten the nested dict into list of records, then create a new dataframe

pd.DataFrame({'date': k, **v} for d in dct for k, v in d.items())


         date      A       B
0 2022-03-31 12323 123123
1 2021-03-31 12 123


Related Topics



Leave a reply



Submit