Construct pandas DataFrame from items in nested dictionary
A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict
, using the option orient='index'
:
user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
'Category 2': {'att_1': 23, 'att_2': 'another'}},
15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
'Category 2': {'att_1': 30, 'att_2': 'bar'}}}
pd.DataFrame.from_dict({(i,j): user_dict[i][j]
for i in user_dict.keys()
for j in user_dict[i].keys()},
orient='index')
att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar
An alternative approach would be to build your dataframe up by concatenating the component dataframes:
user_ids = []
frames = []
for user_id, d in user_dict.iteritems():
user_ids.append(user_id)
frames.append(pd.DataFrame.from_dict(d, orient='index'))
pd.concat(frames, keys=user_ids)
att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar
Construct a pandas DataFrame from items in a nested dictionary with lists as inner values
You could do:
df = pd.DataFrame(
(
[subkey, key] + value
for key, records in annot_dict.items()
for record in records
for subkey, value in record.items()
),
columns=[
'subunit_ID', 'gene_ID', 'start_index', 'end_index', 'strand','biotype', 'desc'
]
)
Result for
annot_dict = {
'ID_string1': [
{'ID_string1': ['attr11a', 'attr11b', 'attr11c', 'attr11d', 'attr11e']},
{'string12' : ['attr12a', 'attr12b', 'attr12c', 'attr12d', 'attr12e']},
{'string13' : ['attr13a', 'attr13b', 'attr13c', 'attr13d', 'attr13e']},
],
'ID_string2': [
{'ID_string2': ['attr21a', 'attr21b', 'attr21c', 'attr21d', 'attr21e']},
{'string22' : ['attr22a', 'attr22b', 'attr22c', 'attr22d', 'attr22e']},
{'string23' : ['attr23a', 'attr23b', 'attr23c', 'attr23d', 'attr23e']},
]
}
is
subunit_ID gene_ID start_index end_index strand biotype desc
0 ID_string1 ID_string1 attr11a attr11b attr11c attr11d attr11e
1 string12 ID_string1 attr12a attr12b attr12c attr12d attr12e
2 string13 ID_string1 attr13a attr13b attr13c attr13d attr13e
3 ID_string2 ID_string2 attr21a attr21b attr21c attr21d attr21e
4 string22 ID_string2 attr22a attr22b attr22c attr22d attr22e
5 string23 ID_string2 attr23a attr23b attr23c attr23d attr23e
How to convert a nested dictionary with lists to a dataframe in this format
You can use stack
and explode
:
import pandas as pd
nested_dict = { 'Girl': {'June': [45, 32], 'Samantha': [14, 34, 65]},
'Boy': {'Brad': [12, 54, 12], 'Chad': [12]}}
df = pd.DataFrame.from_dict(nested_dict, orient='index')
print(df.stack().explode())
Output:
Girl June 45
June 32
Samantha 14
Samantha 34
Samantha 65
Boy Brad 12
Brad 54
Brad 12
Chad 12
How to create a pandas dataframe from a nested dictionary with lists of dictionaries?
One option would be to merge the lists
of dicts
into a single dict
then build a DataFrame.from_dict
:
import pandas as pd
from collections import ChainMap
dictionary = {'user1': [{'product1': 10}, {'product2': 15}, {'product3': 20}],
'user2': [{'product1': 13}, {'product2': 8}, {'product3': 50}]}
df = pd.DataFrame.from_dict(
{k: dict(ChainMap(*v)) for k, v in dictionary.items()},
orient='index'
)
df
:
product3 product2 product1
user1 20 15 10
user2 50 8 13
Optional alphanumeric sort with natsort
:
from natsort import natsorted
df = df.reindex(columns=natsorted(df.columns))
product1 product2 product3
user1 10 15 20
user2 13 8 50
{k: dict(ChainMap(*v)) for k, v in dictionary.items()}
{'user1': {'product3': 20, 'product2': 15, 'product1': 10},
'user2': {'product3': 50, 'product2': 8, 'product1': 13}}
Python Pandas: Convert nested dictionary to dataframe
Try DataFrame.from_dict()
and with keyword argument orient
as 'index'
-
Example -
In [20]: d = {1 : {'tp': 26, 'fp': 112},
....: 2 : {'tp': 26, 'fp': 91},
....: 3 : {'tp': 23, 'fp': 74}}
In [24]: df =pd.DataFrame.from_dict(d,orient='index')
In [25]: df
Out[25]:
tp fp
1 26 112
2 26 91
3 23 74
If you also want to set the column name for index
column , use - df.index.name
, Example -
In [30]: df.index.name = 't'
In [31]: df
Out[31]:
tp fp
t
1 26 112
2 26 91
3 23 74
How to convert a nested dict, to a pandas dataframe
Loading a JSON/dict:
- Using
.json_normalized
to expand thedict
.
import pandas as pd
data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
df = pd.json_normalize(data)
# display(df)
id data.name data.lastname data.office.num data.office.department
0 3241234 carol netflik 3543 trigy
If the dataframe has column of dicts
- Also see this answer, to this SO: Split / Explode a column of dictionaries into separate columns with pandas
# dataframe with column of dicts
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
1 2 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
2 3 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
# normalize the column of dicts
normalized = pd.json_normalize(df['col'])
# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])
# display(df)
col2 id data.name data.lastname data.office.num data.office.department
0 1 3241234 carol netflik 3543 trigy
1 2 3241234 carol netflik 3543 trigy
2 3 3241234 carol netflik 3543 trigy
If the dataframe has a column of lists
with dicts
- The
dicts
need to be removed from thelists
with.explode
data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})
# display(df)
col2 col
0 1 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
1 2 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
2 3 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
# explode the lists
df = df.explode('col', ignore_index=True)
# remove and normalize the column of dicts
normalized = pd.json_normalize(df.pop('col'))
# join the normalized column to df
df = df.join(normalized)
Pandas Dataframe from nested dictionary of pandas dataframes
Idea is create tuples by both keys and pass to concat
, third level of MultiIndex
is created from index values of original DataFrame
s, if necessary you can remove it:
my_dict = {
'elem1':{'day1': pd.DataFrame(1, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(2, columns=['Col1', 'Col2'], index=[1,2])
},
'elem2':{'day1': pd.DataFrame(3, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(4, columns=['Col1', 'Col2'], index=[1,2]),
'day3': pd.DataFrame(5, columns=['Col1', 'Col2'], index=[1,2])
}
}
d = {(k1, k2): v2 for k1, v1 in my_dict.items() for k2, v2 in v1.items()}
print (d)
{('elem1', 'day1'): Col1 Col2
1 1 1
2 1 1, ('elem1', 'day2'): Col1 Col2
1 2 2
2 2 2, ('elem2', 'day1'): Col1 Col2
1 3 3
2 3 3, ('elem2', 'day2'): Col1 Col2
1 4 4
2 4 4, ('elem2', 'day3'): Col1 Col2
1 5 5
2 5 5}
df = pd.concat(d, sort=False)
print (df)
Col1 Col2
elem1 day1 1 1 1
2 1 1
day2 1 2 2
2 2 2
elem2 day1 1 3 3
2 3 3
day2 1 4 4
2 4 4
day3 1 5 5
2 5 5
df = pd.concat(d, sort=False).reset_index(level=2, drop=True)
print (df)
Col1 Col2
elem1 day1 1 1
day1 1 1
day2 2 2
day2 2 2
elem2 day1 3 3
day1 3 3
day2 4 4
day2 4 4
day3 5 5
day3 5 5
Pandas: transforming dataframe to nested dictionary
You can group your dataframe by all columns except price, then create your dictionaries in a loop:
# if more than one price for one product in a chain, then calculate mean:
grouped_df = df.groupby(['Month_Year', 'City_Name', 'Chain_Name', 'Product_Name']).agg('mean')
result = dict()
nested_dict = dict()
for index, value in grouped_df.itertuples():
for i, key in enumerate(index):
if i == 0:
if not key in result:
result[key] = {}
nested_dict = result[key]
elif i == len(index) - 1:
nested_dict[key] = value
else:
if not key in nested_dict:
nested_dict[key] = {}
nested_dict = nested_dict[key]
print(json.dumps(result, indent=4))
Changing your df to show nested dict and mean calculation to:
Month_Year City_Name Chain_Name Product_Name Product_Price
0 11-2021 London Aldi Pasta 2.33
1 11-2021 London Aldi Pasta 2.35
2 11-2021 London Aldi Olives 3.99
3 11-2021 Bristol Spar Bananas 1.45
4 10-2021 London Tesco Olives 4.12
5 10-2021 Cardiff Spar Pasta 2.25
You get the output:
{
"10-2021": {
"Cardiff": {
"Spar": {
"Pasta": 2.25
}
},
"London": {
"Tesco": {
"Olives": 4.12
}
}
},
"11-2021": {
"Bristol": {
"Spar": {
"Bananas": 1.45
}
},
"London": {
"Aldi": {
"Olives": 3.99,
"Pasta": 2.34
}
}
}
}
Create a nested dictionary from a dataframe
Following python code is the solution for your problem
import pandas as pd
d = {"field_name": ["foo", "foo", "foo", "bar", "bar"],
"values": ["key1", "key2", "key3", "key1", "key5"],
"description": ["value1", "value2", "value3", "value4", "value6"]}
df = pd.DataFrame(data=d)
print(df.values)
resultant_dict = {}
"""
df.values is like
[['foo' 'key1' 'value1']
['foo' 'key2' 'value2']
['foo' 'key3' 'value3']
['bar' 'key1' 'value4']
['bar' 'key5' 'value6']]
"""
for i in df.values:
if i[0] in resultant_dict:
resultant_dict[i[0]][i[1]] = i[2]
else:
resultant_dict[i[0]] = {i[1]: i[2]}
print(resultant_dict)
# Resultant Dict is {'foo': {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}, 'bar': {'key1': 'value4',
# 'key5': 'value6'}}
converting a nested dictionary to Pandas DataFrame
You can simply use:
df = pd.DataFrame(d['result']).T
Or:
df = pd.DataFrame.from_dict(d['result'], orient='index')
Output:
A B C D
2011-12-01 53 28 32 0
2012-01-01 51 35 49 0
2012-02-01 63 32 56 0
Related Topics
Creating a Symbolic in Shared Volume of Docker and Accessing It in Host MAChine
What Are the Differences Between the Urllib, Urllib2, Urllib3 and Requests Module
Setting Y-Axis Limit in Matplotlib
How to Get the Ascii Value of a Character
What Do Ellipsis [...] Mean in a List
How to Remove Convexity Defects in a Sudoku Square
How to Use Phantomjs in Python
Django Multivaluedictkeyerror Error, How to Deal with It
Why Does Loading the Libc Shared Library Have "'Libraryloader' Object Is Not Callable" Error
How to Add File Extensions Based on File Type on Linux/Unix
Detect and Exclude Outliers in a Pandas Dataframe
Getting an "Invalid Syntax" When Trying to Perform String Interpolation
Plotting Time in Python with Matplotlib
Create an Empty List with Certain Size in Python