Reduce Multi-Index/Multi-Level Dataframe to Single Index, Single Level

Reverting from multiindex to single index dataframe in pandas

pass level=[0,1] to just reset those levels:

dist_df = dist_df.reset_index(level=[0,1])

In [28]:
df.reset_index(level=[0,1])

Out[28]:
YEAR MONTH NI
datetime
2000-01-01 2000 1 NaN
2000-01-02 2000 1 NaN
2000-01-03 2000 1 NaN
2000-01-04 2000 1 NaN
2000-01-05 2000 1 NaN

you can pass the label names alternatively:

df.reset_index(level=['YEAR','MONTH'])

Reduce multi-index/multi-level dataframe to single index, single level

Using set_axis, map, and join

df2.set_axis(df2.columns.map('_'.join), axis=1, inplace=False).add_suffix('_MPG')

Car_Diesel_MPG Car_Gas_MPG
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9

groupby with a dict

m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG')

Car_Diesel_MPG Car_Gas_MPG
Year
2000 14.7 20.5
2009 18.0 22.3
2017 22.2 50.9

Either of these can reset_index

m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG').reset_index()

Year Car_Diesel_MPG Car_Gas_MPG
0 2000 14.7 20.5
1 2009 18.0 22.3
2 2017 22.2 50.9

groupby instead of pivot_table

df.groupby(
['Year', df.VehicleType.str.cat(df.FuelType, sep='_').add('_MPG').values]
).MPG.sum().unstack().reset_index()

Year Car_Diesel_MPG Car_Gas_MPG
0 2000 14.7 20.5
1 2009 18.0 22.3
2 2017 22.2 50.9

Transforming Multiindex into single index after groupby() Pandas

We can using droplevel

df_final.columns=df_final.columns.droplevel(0)
df_final.reset_index(inplace=True)

Pandas multi-index unstack to single row

Try set_index + unstack to reshape to long format

new_df = df.set_index(['name', 'index', 'f1', 'f2', 'f3']).unstack('index')

OR via pivot

new_df = df.pivot(index=['name', 'f1', 'f2', 'f3'], columns='index')

Sort MultiIndex with sort_index:

new_df = new_df.sort_index(axis=1, level=1)

Then reduce MultiIndex via map + reset_index:

new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str, s)))

new_df = new_df.reset_index()

new_df:

      name      f1      f2        f3  calc1_1  calc2_1  calc3_1  calc1_2  calc2_2  calc3_2  calc1_3  calc2_3  calc3_3
0 chicken white yellow feathers 0.04 1.18 -2.01 0.18 0.73 -1.21 NaN NaN NaN
1 fox red white fur 0.21 1.67 -0.34 0.76 2.20 -1.02 0.01 1.12 -0.22
2 grain yellow bag corn 0.89 1.65 -1.03 0.34 2.45 -0.45 0.87 1.11 -0.97

Complete Code:

import pandas as pd

df = pd.DataFrame({
'name': ['fox', 'fox', 'fox', 'chicken', 'chicken', 'grain', 'grain',
'grain'],
'index': [1, 2, 3, 1, 2, 1, 2, 3],
'f1': ['red', 'red', 'red', 'white', 'white', 'yellow', 'yellow', 'yellow'],
'f2': ['white', 'white', 'white', 'yellow', 'yellow', 'bag', 'bag', 'bag'],
'f3': ['fur', 'fur', 'fur', 'feathers', 'feathers', 'corn', 'corn', 'corn'],
'calc1': [0.21, 0.76, 0.01, 0.04, 0.18, 0.89, 0.34, 0.87],
'calc2': [1.67, 2.2, 1.12, 1.18, 0.73, 1.65, 2.45, 1.11],
'calc3': [-0.34, -1.02, -0.22, -2.01, -1.21, -1.03, -0.45, -0.97]
})

new_df = (
df.set_index(['name', 'index', 'f1', 'f2', 'f3'])
.unstack('index')
.sort_index(axis=1, level=1)
)

new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str, s)))

new_df = new_df.reset_index()

How to remove levels from a multi-indexed dataframe?

df.reset_index(level=2, drop=True)
Out[29]:
A
1 1 8
3 9

How do I reindex a MultiIndex with additional Rows for only one Index Level?

I found my solution eventually though it's more complex than what I liked. I added it in a generic function:

def index_fill_missing(df, index_cols, fill_col, fill_value=0):
"""
Finds all the unique values of the column 'fill_col' in df and
returns a dataframe with an index based on index_cols + fill_col where the
a new row is added for any rows where the value in fill_col did not
previously exist in the dataframe.

The additional values are set to the value of the parameter 'fill_value'

Parameters:
df pandas.DataFrame: the dataframe
index_cols list(str): the list of column names to use in the index column
fill_col (str): the column name for which all values should appear in every
single index.
fill_value (any): the value to fill the metric columns in new rows.

Returns:
pandasPdateframe: DataFrame with MultiINdex and additional rows.
"""
# Get unique values for the fill_col.
fill_val_list = df[fill_col].unique().tolist()
# Create a dataframe with the reduced index and get a list of tuples
# with the index values.
df_i = df.set_index(index_cols)
df_i_tup = df_i.index.unique().tolist()
# Append the fill col values to each and every of these index tuples.
df_f_tup = []
col_names = list(index_cols)
col_names.append(fill_col)
print(col_names)
for tup in df_i_tup:
for fill_val in fill_val_list:
df_f_tup.append(tup + (fill_val,))
# Create an index based on these tuples and reindex the dataframe.
idx_f = pd.MultiIndex.from_tuples(df_f_tup, names=col_names)
print(idx_f)
# We can only reindex if there are no duplicate values
# Hence the groupby with sum function.
df_g = df.groupby(by=col_names).sum()
df_f = df_g.reindex(index=idx_f, fill_value=fill_value)
return df_f

Creating the sample dataframe:

  '2020-01-01', '2020-02-01', 
'2020-02-01',
'2020-01-01', '2020-02-01']
brands = ['BA','BA','BB','BC','BC']
sources = ['SA', 'SA', 'SA', 'SB', 'SB']
volumes1 = [5, 10, 5, 5, 10]
volumes2 = [5, 10, 5, 5, 10]
df = pd.DataFrame(
list(zip(dates, brands, sources, volumes1, volumes2)),
columns=['month', 'brand', 'source', 'volume1', 'volume2']
)
df

Resulting Output:

        month brand source  volume1  volume2
0 2020-01-01 BA SA 5 5
1 2020-02-01 BA SA 10 10
2 2020-02-01 BB SA 5 5
3 2020-01-01 BC SB 5 5
4 2020-02-01 BC SB 10 10

And applying the function:

df2 = index_fill_missing(df, ['source', 'brand'], 'month')
df2

Resulting output:

                         volume1  volume2
source brand month
SA BA 2020-01-01 5 5
2020-02-01 10 10
BB 2020-01-01 0 0
2020-02-01 5 5
SB BC 2020-01-01 15 15
2020-02-01 0 0

concatenate multiindex into single index in pandas series

Use map with join:

s.index = s.index.map('_'.join)

Alternative is list comprehension:

s.index = ['{}_{}'.format(i, j) for i, j in s.index]

print (s)
one_a 1.0
one_b 2.0
two_a 3.0
two_b 4.0
dtype: float64

Modifying a subset of a pandas MultiIndex

One way is to re-assign the index by pd.MultiIndex:

idx_to_change = {(0, 10), (9, 25)}

data.index = pd.MultiIndex.from_tuples([i if i not in idx_to_change else (i[0],i[1]+10) for i in data.index], names=("start","end"))
print (data)

col1 col2
start end
0 20 a 1
12 20 b 1
9 35 a 2
24 32 d 2


Related Topics



Leave a reply



Submit