Construct Pandas Dataframe from Items in Nested Dictionary

Construct pandas DataFrame from items in nested dictionary

A pandas MultiIndex consists of a list of tuples. So the most natural approach would be to reshape your input dict so that its keys are tuples corresponding to the multi-index values you require. Then you can just construct your dataframe using pd.DataFrame.from_dict, using the option orient='index':

user_dict = {12: {'Category 1': {'att_1': 1, 'att_2': 'whatever'},
'Category 2': {'att_1': 23, 'att_2': 'another'}},
15: {'Category 1': {'att_1': 10, 'att_2': 'foo'},
'Category 2': {'att_1': 30, 'att_2': 'bar'}}}

pd.DataFrame.from_dict({(i,j): user_dict[i][j]
for i in user_dict.keys()
for j in user_dict[i].keys()},
orient='index')

att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar

An alternative approach would be to build your dataframe up by concatenating the component dataframes:

user_ids = []
frames = []

for user_id, d in user_dict.iteritems():
user_ids.append(user_id)
frames.append(pd.DataFrame.from_dict(d, orient='index'))

pd.concat(frames, keys=user_ids)

att_1 att_2
12 Category 1 1 whatever
Category 2 23 another
15 Category 1 10 foo
Category 2 30 bar

Construct a pandas DataFrame from items in a nested dictionary with lists as inner values

You could do:

df =  pd.DataFrame(
(
[subkey, key] + value
for key, records in annot_dict.items()
for record in records
for subkey, value in record.items()
),
columns=[
'subunit_ID', 'gene_ID', 'start_index', 'end_index', 'strand','biotype', 'desc'
]
)

Result for

annot_dict = {
'ID_string1': [
{'ID_string1': ['attr11a', 'attr11b', 'attr11c', 'attr11d', 'attr11e']},
{'string12' : ['attr12a', 'attr12b', 'attr12c', 'attr12d', 'attr12e']},
{'string13' : ['attr13a', 'attr13b', 'attr13c', 'attr13d', 'attr13e']},
],
'ID_string2': [
{'ID_string2': ['attr21a', 'attr21b', 'attr21c', 'attr21d', 'attr21e']},
{'string22' : ['attr22a', 'attr22b', 'attr22c', 'attr22d', 'attr22e']},
{'string23' : ['attr23a', 'attr23b', 'attr23c', 'attr23d', 'attr23e']},
]
}

is

   subunit_ID     gene_ID start_index end_index   strand  biotype     desc
0 ID_string1 ID_string1 attr11a attr11b attr11c attr11d attr11e
1 string12 ID_string1 attr12a attr12b attr12c attr12d attr12e
2 string13 ID_string1 attr13a attr13b attr13c attr13d attr13e
3 ID_string2 ID_string2 attr21a attr21b attr21c attr21d attr21e
4 string22 ID_string2 attr22a attr22b attr22c attr22d attr22e
5 string23 ID_string2 attr23a attr23b attr23c attr23d attr23e

How to convert a nested dictionary with lists to a dataframe in this format

You can use stack and explode:

import pandas as pd

nested_dict = { 'Girl': {'June': [45, 32], 'Samantha': [14, 34, 65]},
'Boy': {'Brad': [12, 54, 12], 'Chad': [12]}}

df = pd.DataFrame.from_dict(nested_dict, orient='index')
print(df.stack().explode())

Output:

Girl  June        45
June 32
Samantha 14
Samantha 34
Samantha 65
Boy Brad 12
Brad 54
Brad 12
Chad 12

How to create a pandas dataframe from a nested dictionary with lists of dictionaries?

One option would be to merge the lists of dicts into a single dict then build a DataFrame.from_dict:

import pandas as pd
from collections import ChainMap

dictionary = {'user1': [{'product1': 10}, {'product2': 15}, {'product3': 20}],
'user2': [{'product1': 13}, {'product2': 8}, {'product3': 50}]}

df = pd.DataFrame.from_dict(
{k: dict(ChainMap(*v)) for k, v in dictionary.items()},
orient='index'
)

df:

       product3  product2  product1
user1 20 15 10
user2 50 8 13

Optional alphanumeric sort with natsort:

from natsort import natsorted

df = df.reindex(columns=natsorted(df.columns))
       product1  product2  product3
user1 10 15 20
user2 13 8 50


{k: dict(ChainMap(*v)) for k, v in dictionary.items()}
{'user1': {'product3': 20, 'product2': 15, 'product1': 10},
'user2': {'product3': 50, 'product2': 8, 'product1': 13}}

Python Pandas: Convert nested dictionary to dataframe

Try DataFrame.from_dict() and with keyword argument orient as 'index' -

Example -

In [20]: d = {1 : {'tp': 26, 'fp': 112},
....: 2 : {'tp': 26, 'fp': 91},
....: 3 : {'tp': 23, 'fp': 74}}

In [24]: df =pd.DataFrame.from_dict(d,orient='index')

In [25]: df
Out[25]:
tp fp
1 26 112
2 26 91
3 23 74

If you also want to set the column name for index column , use - df.index.name , Example -

In [30]: df.index.name = 't'

In [31]: df
Out[31]:
tp fp
t
1 26 112
2 26 91
3 23 74

How to convert a nested dict, to a pandas dataframe

Loading a JSON/dict:

  • Using .json_normalized to expand the dict.
import pandas as pd

data = {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}

df = pd.json_normalize(data)

# display(df)
id data.name data.lastname data.office.num data.office.department
0 3241234 carol netflik 3543 trigy

If the dataframe has column of dicts

  • Also see this answer, to this SO: Split / Explode a column of dictionaries into separate columns with pandas
# dataframe with column of dicts
df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})

# display(df)
col2 col
0 1 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
1 2 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}
2 3 {'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}

# normalize the column of dicts
normalized = pd.json_normalize(df['col'])

# join the normalized column to df
df = df.join(normalized).drop(columns=['col'])

# display(df)
col2 id data.name data.lastname data.office.num data.office.department
0 1 3241234 carol netflik 3543 trigy
1 2 3241234 carol netflik 3543 trigy
2 3 3241234 carol netflik 3543 trigy

If the dataframe has a column of lists with dicts

  • The dicts need to be removed from the lists with .explode
data = [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]

df = pd.DataFrame({'col2': [1, 2, 3], 'col': [data, data, data]})

# display(df)
col2 col
0 1 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
1 2 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]
2 3 [{'id': 3241234, 'data': {'name': 'carol', 'lastname': 'netflik', 'office': {'num': 3543, 'department': 'trigy'}}}]

# explode the lists
df = df.explode('col', ignore_index=True)

# remove and normalize the column of dicts
normalized = pd.json_normalize(df.pop('col'))

# join the normalized column to df
df = df.join(normalized)

Pandas Dataframe from nested dictionary of pandas dataframes

Idea is create tuples by both keys and pass to concat, third level of MultiIndex is created from index values of original DataFrames, if necessary you can remove it:

my_dict = {
'elem1':{'day1': pd.DataFrame(1, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(2, columns=['Col1', 'Col2'], index=[1,2])
},
'elem2':{'day1': pd.DataFrame(3, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(4, columns=['Col1', 'Col2'], index=[1,2]),
'day3': pd.DataFrame(5, columns=['Col1', 'Col2'], index=[1,2])
}
}


d = {(k1, k2): v2 for k1, v1 in my_dict.items() for k2, v2 in v1.items()}
print (d)
{('elem1', 'day1'): Col1 Col2
1 1 1
2 1 1, ('elem1', 'day2'): Col1 Col2
1 2 2
2 2 2, ('elem2', 'day1'): Col1 Col2
1 3 3
2 3 3, ('elem2', 'day2'): Col1 Col2
1 4 4
2 4 4, ('elem2', 'day3'): Col1 Col2
1 5 5
2 5 5}

df = pd.concat(d, sort=False)
print (df)
Col1 Col2
elem1 day1 1 1 1
2 1 1
day2 1 2 2
2 2 2
elem2 day1 1 3 3
2 3 3
day2 1 4 4
2 4 4
day3 1 5 5
2 5 5


df = pd.concat(d, sort=False).reset_index(level=2, drop=True)
print (df)
Col1 Col2
elem1 day1 1 1
day1 1 1
day2 2 2
day2 2 2
elem2 day1 3 3
day1 3 3
day2 4 4
day2 4 4
day3 5 5
day3 5 5

Pandas: transforming dataframe to nested dictionary

You can group your dataframe by all columns except price, then create your dictionaries in a loop:

# if more than one price for one product in a chain, then calculate mean:
grouped_df = df.groupby(['Month_Year', 'City_Name', 'Chain_Name', 'Product_Name']).agg('mean')

result = dict()
nested_dict = dict()

for index, value in grouped_df.itertuples():
for i, key in enumerate(index):
if i == 0:
if not key in result:
result[key] = {}
nested_dict = result[key]
elif i == len(index) - 1:
nested_dict[key] = value
else:
if not key in nested_dict:
nested_dict[key] = {}
nested_dict = nested_dict[key]

print(json.dumps(result, indent=4))

Changing your df to show nested dict and mean calculation to:

  Month_Year City_Name Chain_Name Product_Name  Product_Price
0 11-2021 London Aldi Pasta 2.33
1 11-2021 London Aldi Pasta 2.35
2 11-2021 London Aldi Olives 3.99
3 11-2021 Bristol Spar Bananas 1.45
4 10-2021 London Tesco Olives 4.12
5 10-2021 Cardiff Spar Pasta 2.25

You get the output:

{
"10-2021": {
"Cardiff": {
"Spar": {
"Pasta": 2.25
}
},
"London": {
"Tesco": {
"Olives": 4.12
}
}
},
"11-2021": {
"Bristol": {
"Spar": {
"Bananas": 1.45
}
},
"London": {
"Aldi": {
"Olives": 3.99,
"Pasta": 2.34
}
}
}
}

Create a nested dictionary from a dataframe

Following python code is the solution for your problem

import pandas as pd

d = {"field_name": ["foo", "foo", "foo", "bar", "bar"],
"values": ["key1", "key2", "key3", "key1", "key5"],
"description": ["value1", "value2", "value3", "value4", "value6"]}
df = pd.DataFrame(data=d)
print(df.values)

resultant_dict = {}
"""
df.values is like
[['foo' 'key1' 'value1']
['foo' 'key2' 'value2']
['foo' 'key3' 'value3']
['bar' 'key1' 'value4']
['bar' 'key5' 'value6']]
"""
for i in df.values:
if i[0] in resultant_dict:
resultant_dict[i[0]][i[1]] = i[2]
else:
resultant_dict[i[0]] = {i[1]: i[2]}

print(resultant_dict)

# Resultant Dict is {'foo': {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}, 'bar': {'key1': 'value4',
# 'key5': 'value6'}}

converting a nested dictionary to Pandas DataFrame

You can simply use:

df = pd.DataFrame(d['result']).T

Or:

df = pd.DataFrame.from_dict(d['result'], orient='index')

Output:

             A   B   C  D
2011-12-01 53 28 32 0
2012-01-01 51 35 49 0
2012-02-01 63 32 56 0


Related Topics



Leave a reply



Submit