Reverting from multiindex to single index dataframe in pandas
pass level=[0,1]
to just reset those levels:
dist_df = dist_df.reset_index(level=[0,1])
In [28]:
df.reset_index(level=[0,1])
Out[28]:
YEAR MONTH NI
datetime
2000-01-01 2000 1 NaN
2000-01-02 2000 1 NaN
2000-01-03 2000 1 NaN
2000-01-04 2000 1 NaN
2000-01-05 2000 1 NaN
you can pass the label names alternatively:
df.reset_index(level=['YEAR','MONTH'])
Reduce multi-index/multi-level dataframe to single index, single level
Usingset_axis
, map
, and join
df2.set_axis(df2.columns.map('_'.join), axis=1, inplace=False).add_suffix('_MPG')
Car_Diesel_MPG Car_Gas_MPG
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9
groupby
with a dict
m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG')
Car_Diesel_MPG Car_Gas_MPG
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9
Either of these can reset_index
m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG').reset_index()
Year Car_Diesel_MPG Car_Gas_MPG
0 2000 14.7 20.5
1 2009 18.0 22.3
2 2017 22.2 50.9
groupby
instead of pivot_table
df.groupby(
['Year', df.VehicleType.str.cat(df.FuelType, sep='_').add('_MPG').values]
).MPG.sum().unstack().reset_index()
Year Car_Diesel_MPG Car_Gas_MPG
0 2000 14.7 20.5
1 2009 18.0 22.3
2 2017 22.2 50.9
Transforming Multiindex into single index after groupby() Pandas
We can using droplevel
df_final.columns=df_final.columns.droplevel(0)
df_final.reset_index(inplace=True)
Pandas multi-index unstack to single row
Try set_index
+ unstack
to reshape to long format
new_df = df.set_index(['name', 'index', 'f1', 'f2', 'f3']).unstack('index')
OR via pivot
new_df = df.pivot(index=['name', 'f1', 'f2', 'f3'], columns='index')
Sort MultiIndex with sort_index
:
new_df = new_df.sort_index(axis=1, level=1)
Then reduce MultiIndex via map
+ reset_index
:
new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str, s)))
new_df = new_df.reset_index()
new_df
:
name f1 f2 f3 calc1_1 calc2_1 calc3_1 calc1_2 calc2_2 calc3_2 calc1_3 calc2_3 calc3_3
0 chicken white yellow feathers 0.04 1.18 -2.01 0.18 0.73 -1.21 NaN NaN NaN
1 fox red white fur 0.21 1.67 -0.34 0.76 2.20 -1.02 0.01 1.12 -0.22
2 grain yellow bag corn 0.89 1.65 -1.03 0.34 2.45 -0.45 0.87 1.11 -0.97
Complete Code:
import pandas as pd
df = pd.DataFrame({
'name': ['fox', 'fox', 'fox', 'chicken', 'chicken', 'grain', 'grain',
'grain'],
'index': [1, 2, 3, 1, 2, 1, 2, 3],
'f1': ['red', 'red', 'red', 'white', 'white', 'yellow', 'yellow', 'yellow'],
'f2': ['white', 'white', 'white', 'yellow', 'yellow', 'bag', 'bag', 'bag'],
'f3': ['fur', 'fur', 'fur', 'feathers', 'feathers', 'corn', 'corn', 'corn'],
'calc1': [0.21, 0.76, 0.01, 0.04, 0.18, 0.89, 0.34, 0.87],
'calc2': [1.67, 2.2, 1.12, 1.18, 0.73, 1.65, 2.45, 1.11],
'calc3': [-0.34, -1.02, -0.22, -2.01, -1.21, -1.03, -0.45, -0.97]
})
new_df = (
df.set_index(['name', 'index', 'f1', 'f2', 'f3'])
.unstack('index')
.sort_index(axis=1, level=1)
)
new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str, s)))
new_df = new_df.reset_index()
How to remove levels from a multi-indexed dataframe?
df.reset_index(level=2, drop=True)
Out[29]:
A
1 1 8
3 9
How do I reindex a MultiIndex with additional Rows for only one Index Level?
I found my solution eventually though it's more complex than what I liked. I added it in a generic function:
def index_fill_missing(df, index_cols, fill_col, fill_value=0):
"""
Finds all the unique values of the column 'fill_col' in df and
returns a dataframe with an index based on index_cols + fill_col where the
a new row is added for any rows where the value in fill_col did not
previously exist in the dataframe.
The additional values are set to the value of the parameter 'fill_value'
Parameters:
df pandas.DataFrame: the dataframe
index_cols list(str): the list of column names to use in the index column
fill_col (str): the column name for which all values should appear in every
single index.
fill_value (any): the value to fill the metric columns in new rows.
Returns:
pandasPdateframe: DataFrame with MultiINdex and additional rows.
"""
# Get unique values for the fill_col.
fill_val_list = df[fill_col].unique().tolist()
# Create a dataframe with the reduced index and get a list of tuples
# with the index values.
df_i = df.set_index(index_cols)
df_i_tup = df_i.index.unique().tolist()
# Append the fill col values to each and every of these index tuples.
df_f_tup = []
col_names = list(index_cols)
col_names.append(fill_col)
print(col_names)
for tup in df_i_tup:
for fill_val in fill_val_list:
df_f_tup.append(tup + (fill_val,))
# Create an index based on these tuples and reindex the dataframe.
idx_f = pd.MultiIndex.from_tuples(df_f_tup, names=col_names)
print(idx_f)
# We can only reindex if there are no duplicate values
# Hence the groupby with sum function.
df_g = df.groupby(by=col_names).sum()
df_f = df_g.reindex(index=idx_f, fill_value=fill_value)
return df_f
Creating the sample dataframe:
'2020-01-01', '2020-02-01',
'2020-02-01',
'2020-01-01', '2020-02-01']
brands = ['BA','BA','BB','BC','BC']
sources = ['SA', 'SA', 'SA', 'SB', 'SB']
volumes1 = [5, 10, 5, 5, 10]
volumes2 = [5, 10, 5, 5, 10]
df = pd.DataFrame(
list(zip(dates, brands, sources, volumes1, volumes2)),
columns=['month', 'brand', 'source', 'volume1', 'volume2']
)
df
Resulting Output:
month brand source volume1 volume2
0 2020-01-01 BA SA 5 5
1 2020-02-01 BA SA 10 10
2 2020-02-01 BB SA 5 5
3 2020-01-01 BC SB 5 5
4 2020-02-01 BC SB 10 10
And applying the function:
df2 = index_fill_missing(df, ['source', 'brand'], 'month')
df2
Resulting output:
volume1 volume2
source brand month
SA BA 2020-01-01 5 5
2020-02-01 10 10
BB 2020-01-01 0 0
2020-02-01 5 5
SB BC 2020-01-01 15 15
2020-02-01 0 0
concatenate multiindex into single index in pandas series
Use map
with join
:
s.index = s.index.map('_'.join)
Alternative is list comprehension
:
s.index = ['{}_{}'.format(i, j) for i, j in s.index]
print (s)
one_a 1.0
one_b 2.0
two_a 3.0
two_b 4.0
dtype: float64
Modifying a subset of a pandas MultiIndex
One way is to re-assign the index by pd.MultiIndex
:
idx_to_change = {(0, 10), (9, 25)}
data.index = pd.MultiIndex.from_tuples([i if i not in idx_to_change else (i[0],i[1]+10) for i in data.index], names=("start","end"))
print (data)
col1 col2
start end
0 20 a 1
12 20 b 1
9 35 a 2
24 32 d 2
Related Topics
Python Selenium - Element Is Not Currently Interactable and May Not Be Manipulated
Python - How to Make User Input Not Case Sensitive
How to Get Maximum Length of Each Column in the Data Frame Using Pandas Python
Python Pandas .Isnull() Does Not Work on Nat in Object Dtype
Finding the Index of the First Occurrence of Any Item in a List
How to Add Parenthesis Around a Substring in a String
How to Make Type Cast for Python Custom Class
Add Numpy Array as Column to Pandas Data Frame
Python: How to Print Separate Lines from a List
Counting Non Zero Values in Each Column of a Dataframe in Python
Filtering the Dataframe Based on the Column Value of Another Dataframe
Delete Rows Containing Numeric Values in Strings from Pandas Dataframe
How to Get One Key and Value from a Json in Python
Possible to Loop Through Excel Files With Differently Named Sheets, and Import into a List