Nested dictionary to multiindex dataframe where dictionary keys are column labels
Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:
>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
('A', 'b'): [6, 7, 8, 9, 1],
('B', 'a'): [2, 3, 4, 5, 6],
('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
A B
a b a b
0 1 6 2 7
1 2 7 3 8
2 3 8 4 9
3 4 9 5 1
4 5 1 6 2
[5 rows x 4 columns]
Nested dictionary to multiindex dataframe
IIUC, try:
my_dict = {"season":"summer", "threshold":70, "analysis" : {"max_temp":50, "min_temp":20}}
df_in = pd.json_normalize(my_dict)
df_in.columns = df_in.columns.str.split('.', expand=True)
df_in
Output:
season threshold analysis
NaN NaN max_temp min_temp
0 summer 70 50 20
How to make multiindex dataframe from a nested dictionary keys and lists of values?
Use ast.literal_eval
to convert each string into a dictionary and build the index from there:
import pandas as pd
from ast import literal_eval
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
keys, data = zip(*dictionary.items())
index = pd.MultiIndex.from_frame(pd.DataFrame([literal_eval(i) for i in keys]))
res = pd.DataFrame(data=list(data), index=index)
print(res)
Output
0 1 2
a b c
12.0 0.8 bla1 200 0.0 0.0
bla2 37 44.0 0.6
1.8 bla3 100 2.0 1.0
bla4 400 3.0 1.0
multiindex dataframe from nested dictionary of sets
I think you should separate the dict into the indices and their data, then make a pd.MultiIndex
and a pd.Series
. I'm working off the example given here, but modified slightly for your dictionary.
index, series = zip(*[((i, j), d[i][j]) for i in d for j in d[i]])
index = pd.MultiIndex.from_tuples(index, names=['Captial', 'lowercase'])
series = pd.Series(series, index=index)
Pandas multi-index/header Dataframe to nested dictionary
In 2 steps:
- Pivot your dataframe and convert it to a flat dict.
- Convert tuple keys to a nested dict
Step 1
data = df.stack(level=[0, 1]).to_dict()
print(data)
# Output
{('-', '-', 'revenue', 'max'): 15.0,
('-', '-', 'revenue', 'min'): 15.0,
('-', '-', 'revenue', 'sum'): 15.0,
...
('Google', 'Country TOT', 'revenue', 'sum'): 0.0,
('Google', 'Country TOT', 'taxes', 'mean'): 23.25,
('Google', 'Country TOT', 'taxes', 'sum'): 628.0}
Step 2
d = {}
for t, v in data.items():
e = d.setdefault(t[0], {})
for k in t[1:-1]:
e = e.setdefault(k, {})
e[t[-1]] = v
print(d)
# Output
{'-': {'-': {'revenue': {'max': 15.0, 'min': 15.0, 'sum': 15.0},
'taxes': {'mean': 0.0, 'sum': 0.0}}},
'Facebook': {'-': {'revenue': {'max': 218.5, 'min': -120.0, 'sum': 1259.79},
'taxes': {'mean': 5.17, 'sum': 321.0}},
'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 4.0, 'sum': 4.0}},
'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 4.0, 'sum': 4.0}},
'Sales': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 9.25, 'sum': 18.5}}},
'Google': {'%tax_1': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 0.89, 'sum': 3.58}},
'-': {'revenue': {'max': 2643.08, 'min': -100.0, 'sum': 3738.36},
'taxes': {'mean': 96.23, 'sum': 26369.57}},
'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 3.55, 'sum': 95.99}},
'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
'taxes': {'mean': 23.25, 'sum': 628.0}}}}
Pretty printing:
import json
print(json.dumps(d, indent=4))
# Output
{
"-": {
"-": {
"revenue": {
"max": 15.0,
"min": 15.0,
"sum": 15.0
},
"taxes": {
"mean": 0.0,
"sum": 0.0
}
}
},
"Facebook": {
"-": {
"revenue": {
"max": 218.5,
"min": -120.0,
"sum": 1259.79
},
"taxes": {
"mean": 5.17,
"sum": 321.0
}
},
"City TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 4.0,
"sum": 4.0
}
},
"Country TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 4.0,
"sum": 4.0
}
},
"Sales": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 9.25,
"sum": 18.5
}
}
},
"Google": {
"%tax_1": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 0.89,
"sum": 3.58
}
},
"-": {
"revenue": {
"max": 2643.08,
"min": -100.0,
"sum": 3738.36
},
"taxes": {
"mean": 96.23,
"sum": 26369.57
}
},
"City TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 3.55,
"sum": 95.99
}
},
"Country TOT": {
"revenue": {
"max": 0.0,
"min": 0.0,
"sum": 0.0
},
"taxes": {
"mean": 23.25,
"sum": 628.0
}
}
}
}
Pandas Dataframe from nested dictionary of pandas dataframes
Idea is create tuples by both keys and pass to concat
, third level of MultiIndex
is created from index values of original DataFrame
s, if necessary you can remove it:
my_dict = {
'elem1':{'day1': pd.DataFrame(1, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(2, columns=['Col1', 'Col2'], index=[1,2])
},
'elem2':{'day1': pd.DataFrame(3, columns=['Col1', 'Col2'], index=[1,2]),
'day2': pd.DataFrame(4, columns=['Col1', 'Col2'], index=[1,2]),
'day3': pd.DataFrame(5, columns=['Col1', 'Col2'], index=[1,2])
}
}
d = {(k1, k2): v2 for k1, v1 in my_dict.items() for k2, v2 in v1.items()}
print (d)
{('elem1', 'day1'): Col1 Col2
1 1 1
2 1 1, ('elem1', 'day2'): Col1 Col2
1 2 2
2 2 2, ('elem2', 'day1'): Col1 Col2
1 3 3
2 3 3, ('elem2', 'day2'): Col1 Col2
1 4 4
2 4 4, ('elem2', 'day3'): Col1 Col2
1 5 5
2 5 5}
df = pd.concat(d, sort=False)
print (df)
Col1 Col2
elem1 day1 1 1 1
2 1 1
day2 1 2 2
2 2 2
elem2 day1 1 3 3
2 3 3
day2 1 4 4
2 4 4
day3 1 5 5
2 5 5
df = pd.concat(d, sort=False).reset_index(level=2, drop=True)
print (df)
Col1 Col2
elem1 day1 1 1
day1 1 1
day2 2 2
day2 2 2
elem2 day1 3 3
day1 3 3
day2 4 4
day2 4 4
day3 5 5
day3 5 5
Creating Multiindex Panda Dataframe from nested dict
You want to 1) flatten your dictionary, 2) load it into a dataframe and 3) reformat the dataframe to your liking.
# 1. Flatten, eg using https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys
import collections
def flatten(dictionary, parent_key=False, separator='.'):
"""
Turn a nested dictionary into a flattened dictionary
:param dictionary: The dictionary to flatten
:param parent_key: The string to prepend to dictionary's keys
:param separator: The string used to separate flattened keys
:return: A flattened dictionary
"""
items = []
for key, value in dictionary.items():
new_key = str(parent_key) + separator + key if parent_key else key
if isinstance(value, collections.MutableMapping):
items.extend(flatten(value, new_key, separator).items())
elif isinstance(value, list):
for k, v in enumerate(value):
items.extend(flatten({str(k): v}, new_key).items())
else:
items.append((new_key, value))
return dict(items)
data_flat = flatten(data_dict)
# 2. Load it in pandas
df = pd.DataFrame.from_dict(data_flat, orient="index")
# 3. Reshape to your liking
df.index = pd.MultiIndex.from_tuples(df.index.str.split(".").map(tuple))
df = df.unstack(level=[0,1]).droplevel(axis=1, level=0)
How to build a MultiIndex Pandas DataFrame from a nested dictionary with lists
I think you are close, for MultiIndex
is possible used MultiIndex.from_tuples
method:
d = {(i,j): d[i][j]
for i in d.keys()
for j in d[i].keys()}
mux = pd.MultiIndex.from_tuples(d.keys())
df = pd.DataFrame(list(d.values()), index=mux)
print (df)
0 1 2 3 4 5 6
key1 sub-key1 a b c d e None None
key2 sub-key2 1 2 3 5 8 9 10
Thanks, Zero for another solution:
df = pd.DataFrame.from_dict({(i,j): d[i][j]
for i in d.keys()
for j in d[i].keys()},
orient='index')
df.index = pd.MultiIndex.from_tuples(df.index)
print (df)
0 1 2 3 4 5 6
key1 sub-key1 a b c d e None None
key2 sub-key2 1 2 3 5 8 9 10
Pandas: MultiIndex from Nested Dictionary
From your dictionary :
>>> import pandas as pd
>>> df = pd.DataFrame.from_dict(dictionary)
>>> df
A B Coords
0 1 2 {'X': [1, 2, 3], 'Y': [1, 2, 3], 'Z': [1, 2, 3]}
1 2 3 {'X': [2, 3], 'Y': [2, 3], 'Z': [2, 3]}
Then we can use pd.Series
to extract the data in dict in the column Coords
like so :
df_concat = pd.concat([df.drop(['Coords'], axis=1), df['Coords'].apply(pd.Series)], axis=1)
>>> df_concat
A B X Y Z
0 1 2 [1, 2, 3] [1, 2, 3] [1, 2, 3]
1 2 3 [2, 3] [2, 3] [2, 3]
To finish we use the explode
method to get the list as rows and set the index on columns A
and B
to get the expected result :
>>> df_concat.explode(['X', 'Y', 'Z']).reset_index().set_index(['index', 'A', 'B'])
X Y Z
index A B
0 1 2 1 1 1
2 2 2 2
2 3 3 3
1 2 3 2 2 2
3 3 3 3
UPDATE :
If you are using a version of Pandas lower than 1.3.0, we can use the trick given by @MillerMrosek in this answer :
def explode(df, columns):
df['tmp']=df.apply(lambda row: list(zip(*[row[_clm] for _clm in columns])), axis=1)
df=df.explode('tmp')
df[columns]=pd.DataFrame(df['tmp'].tolist(), index=df.index)
df.drop(columns='tmp', inplace=True)
return df
explode(df_concat, ["X", "Y", "Z"]).reset_index().set_index(['index', 'A', 'B'])
Output :
X Y Z
index A B
0 1 2 1 1 1
2 2 2 2
2 3 3 3
1 2 3 2 2 2
3 3 3 3
Related Topics
How to Use Pip to Install a Package from a Private Github Repository
Formatting Long Numbers as Strings in Python
Plotting a Fast Fourier Transform in Python
Correct Way to Implement a Custom Popup Tkinter Dialog Box
How to Loop Through All But the Last Item of a List
How to Improve the Label Placement in Scatter Plot
In Selenium Web Driver How to Choose the Correct Iframe
Removing Duplicates from Dictionary
Why Use Abstract Base Classes in Python
Pygame Image Transparency Confusion
Expected Conditions in Protractor
How to Convert an Array of Strings to an Array of Floats in Numpy
What Is the Correct Way to Set Python's Locale on Windows
How to Write Binary Data to Stdout in Python 3
What Is the Advantage of a List Comprehension Over a for Loop