Nested Dictionary to Multiindex Dataframe Where Dictionary Keys Are Column Labels

Nested dictionary to multiindex dataframe where dictionary keys are column labels

Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2

[5 rows x 4 columns]

Nested dictionary to multiindex dataframe

IIUC, try:

my_dict = {"season":"summer", "threshold":70, "analysis" : {"max_temp":50, "min_temp":20}}

df_in = pd.json_normalize(my_dict)

df_in.columns = df_in.columns.str.split('.', expand=True)

df_in

Output:

   season threshold analysis         
NaN NaN max_temp min_temp
0 summer 70 50 20

How to make multiindex dataframe from a nested dictionary keys and lists of values?

Use ast.literal_eval to convert each string into a dictionary and build the index from there:

import pandas as pd
from ast import literal_eval

dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}

keys, data = zip(*dictionary.items())
index = pd.MultiIndex.from_frame(pd.DataFrame([literal_eval(i) for i in keys]))
res = pd.DataFrame(data=list(data), index=index)
print(res)

Output

                  0     1    2
a b c
12.0 0.8 bla1 200 0.0 0.0
bla2 37 44.0 0.6
1.8 bla3 100 2.0 1.0
bla4 400 3.0 1.0

multiindex dataframe from nested dictionary of sets

I think you should separate the dict into the indices and their data, then make a pd.MultiIndex and a pd.Series. I'm working off the example given here, but modified slightly for your dictionary.

index, series = zip(*[((i, j), d[i][j]) for i in d for j in d[i]])

index = pd.MultiIndex.from_tuples(index, names=['Captial', 'lowercase'])
series = pd.Series(series, index=index)

Pandas multi-index/header Dataframe to nested dictionary

In 2 steps:

  1. Pivot your dataframe and convert it to a flat dict.
  2. Convert tuple keys to a nested dict

Step 1

data = df.stack(level=[0, 1]).to_dict()
print(data)

# Output
{('-', '-', 'revenue', 'max'): 15.0,
('-', '-', 'revenue', 'min'): 15.0,
('-', '-', 'revenue', 'sum'): 15.0,
...
('Google', 'Country TOT', 'revenue', 'sum'): 0.0,
('Google', 'Country TOT', 'taxes', 'mean'): 23.25,
('Google', 'Country TOT', 'taxes', 'sum'): 628.0}

Step 2

d = {}
for t, v in data.items():
e = d.setdefault(t[0], {})
for k in t[1:-1]:
e = e.setdefault(k, {})
e[t[-1]] = v
print(d)

# Output
{'-': {'-': {'revenue': {'max': 15.0, 'min': 15.0, 'sum': 15.0},
'taxes': {'mean': 0.0, 'sum': 0.0}}},
'Facebook': {'-': {'revenue': {'max': 218.5, 'min': -120.0, 'sum': 1259.79},
'taxes': {'mean': 5.17, 'sum': 321.0}},
'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 4.0, 'sum': 4.0}},
'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 4.0, 'sum': 4.0}},
'Sales': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 9.25, 'sum': 18.5}}},
'Google': {'%tax_1': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 0.89, 'sum': 3.58}},
'-': {'revenue': {'max': 2643.08, 'min': -100.0, 'sum': 3738.36},
'taxes': {'mean': 96.23, 'sum': 26369.57}},
'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 3.55, 'sum': 95.99}},
'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 23.25, 'sum': 628.0}}}}

Pretty printing:

import json
print(json.dumps(d, indent=4))

# Output
{
"-": {
"-": {
"revenue": {
"max": 15.0,
"min": 15.0,
"sum": 15.0
},
"taxes": {
"mean": 0.0,
"sum": 0.0
}
}
},
"Facebook": {
"-": {
"revenue": {
"max": 218.5,
"min": -120.0,
"sum": 1259.79
},
"taxes": {
"mean": 5.17,
"sum": 321.0
}
},
"City TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 4.0,
"sum": 4.0
}
},
"Country TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 4.0,
"sum": 4.0
}
},
"Sales": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 9.25,
"sum": 18.5
}
}
},
"Google": {
"%tax_1": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 0.89,
"sum": 3.58
}
},
"-": {
"revenue": {
"max": 2643.08,
"min": -100.0,
"sum": 3738.36
},
"taxes": {
"mean": 96.23,
"sum": 26369.57
}
},
"City TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 3.55,
"sum": 95.99
}
},
"Country TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 23.25,
"sum": 628.0
}
}
}
}

Pandas Dataframe from nested dictionary of pandas dataframes

Idea is create tuples by both keys and pass to concat, third level of MultiIndex is created from index values of original DataFrames, if necessary you can remove it:

my_dict = {
'elem1':{'day1': pd.DataFrame(1, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(2, columns=['Col1', 'Col2'], index=[1,2])
},
'elem2':{'day1': pd.DataFrame(3, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(4, columns=['Col1', 'Col2'], index=[1,2]),
'day3': pd.DataFrame(5, columns=['Col1', 'Col2'], index=[1,2])
}
}


d = {(k1, k2): v2 for k1, v1 in my_dict.items() for k2, v2 in v1.items()}
print (d)
{('elem1', 'day1'): Col1 Col2
1 1 1
2 1 1, ('elem1', 'day2'): Col1 Col2
1 2 2
2 2 2, ('elem2', 'day1'): Col1 Col2
1 3 3
2 3 3, ('elem2', 'day2'): Col1 Col2
1 4 4
2 4 4, ('elem2', 'day3'): Col1 Col2
1 5 5
2 5 5}

df = pd.concat(d, sort=False)
print (df)
Col1 Col2
elem1 day1 1 1 1
2 1 1
day2 1 2 2
2 2 2
elem2 day1 1 3 3
2 3 3
day2 1 4 4
2 4 4
day3 1 5 5
2 5 5


df = pd.concat(d, sort=False).reset_index(level=2, drop=True)
print (df)
Col1 Col2
elem1 day1 1 1
day1 1 1
day2 2 2
day2 2 2
elem2 day1 3 3
day1 3 3
day2 4 4
day2 4 4
day3 5 5
day3 5 5

Creating Multiindex Panda Dataframe from nested dict

You want to 1) flatten your dictionary, 2) load it into a dataframe and 3) reformat the dataframe to your liking.

# 1. Flatten, eg using https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys
import collections
def flatten(dictionary, parent_key=False, separator='.'):
"""
Turn a nested dictionary into a flattened dictionary
:param dictionary: The dictionary to flatten
:param parent_key: The string to prepend to dictionary's keys
:param separator: The string used to separate flattened keys
:return: A flattened dictionary
"""

items = []
for key, value in dictionary.items():
new_key = str(parent_key) + separator + key if parent_key else key
if isinstance(value, collections.MutableMapping):
items.extend(flatten(value, new_key, separator).items())
elif isinstance(value, list):
for k, v in enumerate(value):
items.extend(flatten({str(k): v}, new_key).items())
else:
items.append((new_key, value))
return dict(items)

data_flat = flatten(data_dict)

# 2. Load it in pandas
df = pd.DataFrame.from_dict(data_flat, orient="index")

# 3. Reshape to your liking
df.index = pd.MultiIndex.from_tuples(df.index.str.split(".").map(tuple))
df = df.unstack(level=[0,1]).droplevel(axis=1, level=0)

How to build a MultiIndex Pandas DataFrame from a nested dictionary with lists

I think you are close, for MultiIndex is possible used MultiIndex.from_tuples method:

d = {(i,j): d[i][j] 
for i in d.keys()
for j in d[i].keys()}

mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
print (df)
0 1 2 3 4 5 6
key1 sub-key1 a b c d e None None
key2 sub-key2 1 2 3 5 8 9 10

Thanks, Zero for another solution:

df = pd.DataFrame.from_dict({(i,j): d[i][j] 
for i in d.keys()
for j in d[i].keys()},
orient='index')

df.index = pd.MultiIndex.from_tuples(df.index)
print (df)
0 1 2 3 4 5 6
key1 sub-key1 a b c d e None None
key2 sub-key2 1 2 3 5 8 9 10

Pandas: MultiIndex from Nested Dictionary

From your dictionary :

>>> import pandas as pd

>>> df = pd.DataFrame.from_dict(dictionary)
>>> df
A B Coords
0 1 2 {'X': [1, 2, 3], 'Y': [1, 2, 3], 'Z': [1, 2, 3]}
1 2 3 {'X': [2, 3], 'Y': [2, 3], 'Z': [2, 3]}

Then we can use pd.Series to extract the data in dict in the column Coords like so :

df_concat = pd.concat([df.drop(['Coords'], axis=1), df['Coords'].apply(pd.Series)], axis=1)
>>> df_concat
A B X Y Z
0 1 2 [1, 2, 3] [1, 2, 3] [1, 2, 3]
1 2 3 [2, 3] [2, 3] [2, 3]

To finish we use the explode method to get the list as rows and set the index on columns A and B to get the expected result :

>>> df_concat.explode(['X', 'Y', 'Z']).reset_index().set_index(['index', 'A', 'B'])
X Y Z
index A B
0 1 2 1 1 1
2 2 2 2
2 3 3 3
1 2 3 2 2 2
3 3 3 3

UPDATE :

If you are using a version of Pandas lower than 1.3.0, we can use the trick given by @MillerMrosek in this answer :

def explode(df, columns):
df['tmp']=df.apply(lambda row: list(zip(*[row[_clm] for _clm in columns])), axis=1)
df=df.explode('tmp')
df[columns]=pd.DataFrame(df['tmp'].tolist(), index=df.index)
df.drop(columns='tmp', inplace=True)
return df

explode(df_concat, ["X", "Y", "Z"]).reset_index().set_index(['index', 'A', 'B'])

Output :

                X   Y   Z
index A B
0 1 2 1 1 1
2 2 2 2
2 3 3 3
1 2 3 2 2 2
3 3 3 3


Related Topics



Leave a reply



Submit