Construct pandas DataFrame from items in nested dictionary
A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict
, using the option orient='index'
:
user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
'Category 2': {'att_1': 23, 'att_2': 'another'}},
15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
'Category 2': {'att_1': 30, 'att_2': 'bar'}}}
pd.DataFrame.from_dict({(i,j): user_dict[i][j]
for i in user_dict.keys()
for j in user_dict[i].keys()},
orient='index')
att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar
An alternative approach would be to build your dataframe up by concatenating the component dataframes:
user_ids = []
frames = []
for user_id, d in user_dict.iteritems():
user_ids.append(user_id)
frames.append(pd.DataFrame.from_dict(d, orient='index'))
pd.concat(frames, keys=user_ids)
att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar
Convert pandas DataFrame to a nested dict
I don't understand why there isn't a B2
in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
return d
which produces
>>> df
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
>>> pprint.pprint(recur_dictify(df))
{'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},
'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},
'C': {'C1': {'C11': 4}}}
It might be simpler to use a non-pandas approach, though:
def retro_dictify(frame):
d = {}
for row in frame.values:
here = d
for elem in row[:-2]:
if elem not in here:
here[elem] = {}
here = here[elem]
here[row[-2]] = row[-1]
return d
How to convert pandas dataframe to nested dictionary
I think you were very close.
Use groupby
and to_dict
:
df = df.groupby('Name')[['Chain','Food','Healthy']]
.apply(lambda x: x.set_index('Chain').to_dict(orient='index'))
.to_dict()
print (df)
{'George': {'KFC': {'Healthy': False, 'Food': 'chicken'},
'McDonalds': {'Healthy': False, 'Food': 'burger'}},
'John': {'McDonalds': {'Healthy': True, 'Food': 'salad'},
'Wendys': {'Healthy': False, 'Food': 'burger'}}}
Pandas: transforming dataframe to nested dictionary
You can group your dataframe by all columns except price, then create your dictionaries in a loop:
# if more than one price for one product in a chain, then calculate mean:
grouped_df = df.groupby(['Month_Year', 'City_Name', 'Chain_Name', 'Product_Name']).agg('mean')
result = dict()
nested_dict = dict()
for index, value in grouped_df.itertuples():
for i, key in enumerate(index):
if i == 0:
if not key in result:
result[key] = {}
nested_dict = result[key]
elif i == len(index) - 1:
nested_dict[key] = value
else:
if not key in nested_dict:
nested_dict[key] = {}
nested_dict = nested_dict[key]
print(json.dumps(result, indent=4))
Changing your df to show nested dict and mean calculation to:
Month_Year City_Name Chain_Name Product_Name Product_Price
0 11-2021 London Aldi Pasta 2.33
1 11-2021 London Aldi Pasta 2.35
2 11-2021 London Aldi Olives 3.99
3 11-2021 Bristol Spar Bananas 1.45
4 10-2021 London Tesco Olives 4.12
5 10-2021 Cardiff Spar Pasta 2.25
You get the output:
{
"10-2021": {
"Cardiff": {
"Spar": {
"Pasta": 2.25
}
},
"London": {
"Tesco": {
"Olives": 4.12
}
}
},
"11-2021": {
"Bristol": {
"Spar": {
"Bananas": 1.45
}
},
"London": {
"Aldi": {
"Olives": 3.99,
"Pasta": 2.34
}
}
}
}
How to convert a nested dict, to a pandas dataframe
Loading a JSON/dict:
- Using
.json_normalized
to expand thedict
.
import pandas as pd
data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
df = pd.json_normalize(data)
# display(df)
id data.name data.lastname data.office.num data.office.department
0 3241234 carol netflik 3543 trigy
If the dataframe has column of dicts
- Also see this answer, to this SO: Split / Explode a column of dictionaries into separate columns with pandas
# dataframe with column of dicts
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
1 2 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
2 3 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
# normalize the column of dicts
normalized = pd.json_normalize(df['col'])
# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])
# display(df)
col2 id data.name data.lastname data.office.num data.office.department
0 1 3241234 carol netflik 3543 trigy
1 2 3241234 carol netflik 3543 trigy
2 3 3241234 carol netflik 3543 trigy
If the dataframe has a column of lists
with dicts
- The
dicts
need to be removed from thelists
with.explode
data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
1 2 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
2 3 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
# explode the lists
df = df.explode('col', ignore_index=True)
# remove and normalize the column of dicts
normalized = pd.json_normalize(df.pop('col'))
# join the normalized column to df
df = df.join(normalized)
How to convert a nested dictionary with lists to a dataframe in this format
You can use stack
and explode
:
import pandas as pd
nested_dict = { 'Girl': {'June': [45, 32], 'Samantha': [14, 34, 65]},
'Boy': {'Brad': [12, 54, 12], 'Chad': [12]}}
df = pd.DataFrame.from_dict(nested_dict, orient='index')
print(df.stack().explode())
Output:
Girl June 45
June 32
Samantha 14
Samantha 34
Samantha 65
Boy Brad 12
Brad 54
Brad 12
Chad 12
Pandas dataframe groups to nested dict
Assuming you actually want a nested dictionary like this (note the extra braces):
{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
... etc.
}
Here is a step-by-step approach.
First, create a dataframe with the desired (PERSON_ID, YEAR) multi-index:
frame_sorted = frame.set_index(['PERSON_ID', 'YEAR']).sort_index()
print(frame_sorted)
Output:
IN OUT
PERSON_ID YEAR
76129 1951 3.775517 6.028186
1952 3.679453 1.768597
1953 3.530302 0.409578
... etc.
Then, created the nested dict using a nested dictionary comprehension:
person_ids = frame_sorted.index.levels[0]
data_dict = {person: {idx: data.to_dict() for idx, data in frame_sorted.loc[person].iterrows()}
for person in person_ids}
print(data_dict)
Output
{76129: {1951: {'IN': 3.77551684175021, 'OUT': 6.02818626979883},
1952: {'IN': 3.67945267132245, 'OUT': 1.7685974058508},
1953: {'IN': 3.53030183426851, 'OUT': 0.409577500579766}},
...etc.
Converting a pandas dataframe to a nested dict in Python using groupby
First group by device
(level 1) and keep all columns except device
then set variable
as index (level 2) and finally convert all columns to dict (level 3). At the end, convert the whole dataframe as a dict.
import json
d = df.groupby("device")[["variable", "size", "manual", "method", "nrow", "ncol"]] \
.apply(lambda x: x.set_index("variable").to_dict(orient="index")) \
.to_dict()
print(json.dumps(d, indent=4, sort_keys=True))
{
"123456": {
"a": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
},
"b": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
}
},
"7891011": {
"a": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
},
"b": {
"manual": false,
"method": "beta",
"ncol": null,
"nrow": null,
"size": "80"
}
}
}
Flattened and Convert list of Nested Dictionary to Pandas Dataframe
Lets flatten the nested dict into list of records, then create a new dataframe
pd.DataFrame({'date': k, **v} for d in dct for k, v in d.items())
date A B
0 2022-03-31 12323 123123
1 2021-03-31 12 123
Related Topics
Stopping a Thread After a Certain Amount of Time
Target Wsgi Script Cannot Be Loaded as Python Module
How to Bind the Enter Key to a Function in Tkinter
Why Is Using Thread Locals in Django Bad
Generating File to Download with Django
Deleting Multiple Columns Based on Column Names in Pandas
Is It Pythonic to Import Inside Functions
Python Requests.Get() Returns Improperly Decoded Text Instead of Utf-8
Most Pythonic Way to Interleave Two Strings
No Module Named 'Pandas._Libs.Tslibs.Timedeltas' in Pyinstaller
Pythonic Way to Create Union of All Values Contained in Multiple Lists
How to Install Pip for Python 3 on MAC Os X
Why Are Python Strings and Tuples Are Made Immutable
Link Several Popen Commands with Pipes
How to Multiply Each Element in a List by a Number
Return a Download and Rendered Page in One Flask Response