Reduce Multi-Index/Multi-Level Dataframe to Single Index, Single Level

Reverting from multiindex to single index dataframe in pandas

pass level=[0,1] to just reset those levels:

dist_df = dist_df.reset_index(level=[0,1])

In [28]:
df.reset_index(level=[0,1])

Out[28]:
            YEAR  MONTH  NI
datetime                     
2000-01-01  2000      1   NaN
2000-01-02  2000      1   NaN
2000-01-03  2000      1   NaN
2000-01-04  2000      1   NaN
2000-01-05  2000      1   NaN

you can pass the label names alternatively:

df.reset_index(level=['YEAR','MONTH'])

Reduce multi-index/multi-level dataframe to single index, single level

Using set_axis, map, and join

df2.set_axis(df2.columns.map('_'.join), axis=1, inplace=False).add_suffix('_MPG')

      Car_Diesel_MPG  Car_Gas_MPG
Year                             
2000            14.7         20.5
2009            18.0         22.3
2017            22.2         50.9

groupby with a dict

m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG')

      Car_Diesel_MPG  Car_Gas_MPG
Year                             
2000            14.7         20.5
2009            18.0         22.3
2017            22.2         50.9

Either of these can reset_index

m = {t: '_'.join(t) for t in df2.columns}
df2.groupby(m, axis=1).mean().add_suffix('_MPG').reset_index()

   Year  Car_Diesel_MPG  Car_Gas_MPG
0  2000            14.7         20.5
1  2009            18.0         22.3
2  2017            22.2         50.9

groupby instead of pivot_table

df.groupby(
    ['Year', df.VehicleType.str.cat(df.FuelType, sep='_').add('_MPG').values]
).MPG.sum().unstack().reset_index()

   Year  Car_Diesel_MPG  Car_Gas_MPG
0  2000            14.7         20.5
1  2009            18.0         22.3
2  2017            22.2         50.9

Transforming Multiindex into single index after groupby() Pandas

We can using droplevel

df_final.columns=df_final.columns.droplevel(0)
df_final.reset_index(inplace=True)

Pandas multi-index unstack to single row

Try set_index + unstack to reshape to long format

new_df = df.set_index(['name', 'index', 'f1', 'f2', 'f3']).unstack('index')

OR via pivot

new_df = df.pivot(index=['name', 'f1', 'f2', 'f3'], columns='index')

Sort MultiIndex with sort_index:

new_df = new_df.sort_index(axis=1, level=1)

Then reduce MultiIndex via map + reset_index:

new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str, s)))

new_df = new_df.reset_index()

new_df:

      name      f1      f2        f3  calc1_1  calc2_1  calc3_1  calc1_2  calc2_2  calc3_2  calc1_3  calc2_3  calc3_3
0  chicken   white  yellow  feathers     0.04     1.18    -2.01     0.18     0.73    -1.21      NaN      NaN      NaN
1      fox     red   white       fur     0.21     1.67    -0.34     0.76     2.20    -1.02     0.01     1.12    -0.22
2    grain  yellow     bag      corn     0.89     1.65    -1.03     0.34     2.45    -0.45     0.87     1.11    -0.97

Complete Code:

import pandas as pd

df = pd.DataFrame({
    'name': ['fox', 'fox', 'fox', 'chicken', 'chicken', 'grain', 'grain',
             'grain'],
    'index': [1, 2, 3, 1, 2, 1, 2, 3],
    'f1': ['red', 'red', 'red', 'white', 'white', 'yellow', 'yellow', 'yellow'],
    'f2': ['white', 'white', 'white', 'yellow', 'yellow', 'bag', 'bag', 'bag'],
    'f3': ['fur', 'fur', 'fur', 'feathers', 'feathers', 'corn', 'corn', 'corn'],
    'calc1': [0.21, 0.76, 0.01, 0.04, 0.18, 0.89, 0.34, 0.87],
    'calc2': [1.67, 2.2, 1.12, 1.18, 0.73, 1.65, 2.45, 1.11],
    'calc3': [-0.34, -1.02, -0.22, -2.01, -1.21, -1.03, -0.45, -0.97]
})

new_df = (
    df.set_index(['name', 'index', 'f1', 'f2', 'f3'])
        .unstack('index')
        .sort_index(axis=1, level=1)
)

new_df.columns = new_df.columns.map(lambda s: '_'.join(map(str, s)))

new_df = new_df.reset_index()

How to remove levels from a multi-indexed dataframe?

df.reset_index(level=2, drop=True)
Out[29]: 
     A
1 1  8
  3  9

How do I reindex a MultiIndex with additional Rows for only one Index Level?

I found my solution eventually though it's more complex than what I liked. I added it in a generic function:

def index_fill_missing(df, index_cols, fill_col, fill_value=0):
    """
    Finds all the unique values of the column 'fill_col' in df and 
    returns a dataframe with an index based on index_cols + fill_col where the
    a new row is added for any rows where the value in fill_col did not 
    previously exist in the dataframe. 

    The additional values are set to the value of the parameter 'fill_value'

    Parameters: 
    df pandas.DataFrame: the dataframe 
    index_cols list(str): the list of column names to use in the index column
    fill_col (str): the column name for which all values should appear in every 
    single index. 
    fill_value (any): the value to fill the metric columns in new rows. 

    Returns: 
    pandasPdateframe: DataFrame with MultiINdex and additional rows.
    """
    # Get unique values for the fill_col.
    fill_val_list = df[fill_col].unique().tolist()
    # Create a dataframe with the reduced index and get a list of tuples 
    # with the index values. 
    df_i = df.set_index(index_cols)
    df_i_tup = df_i.index.unique().tolist()
    # Append the fill col values to each and every of these index tuples. 
    df_f_tup = []
    col_names = list(index_cols)
    col_names.append(fill_col)
    print(col_names)
    for tup in df_i_tup:
        for fill_val in fill_val_list:
            df_f_tup.append(tup + (fill_val,))
    # Create an index based on these tuples and reindex the dataframe. 
    idx_f = pd.MultiIndex.from_tuples(df_f_tup, names=col_names)
    print(idx_f)
    # We can only reindex if there are no duplicate values 
    # Hence the groupby with sum function. 
    df_g = df.groupby(by=col_names).sum()
    df_f = df_g.reindex(index=idx_f, fill_value=fill_value)
    return df_f

Creating the sample dataframe:

  '2020-01-01', '2020-02-01', 
  '2020-02-01', 
  '2020-01-01', '2020-02-01']
brands = ['BA','BA','BB','BC','BC']
sources = ['SA', 'SA', 'SA', 'SB', 'SB']
volumes1 = [5, 10, 5, 5, 10]
volumes2 = [5, 10, 5, 5, 10]
df = pd.DataFrame(
  list(zip(dates, brands, sources, volumes1, volumes2)), 
  columns=['month', 'brand', 'source', 'volume1', 'volume2']
)
df

Resulting Output:

        month brand source  volume1  volume2
0  2020-01-01    BA     SA        5        5
1  2020-02-01    BA     SA       10       10
2  2020-02-01    BB     SA        5        5
3  2020-01-01    BC     SB        5        5
4  2020-02-01    BC     SB       10       10

And applying the function:

df2 = index_fill_missing(df, ['source', 'brand'], 'month')
df2

Resulting output:

                         volume1  volume2
source brand month                       
SA     BA    2020-01-01        5        5
             2020-02-01       10       10
       BB    2020-01-01        0        0
             2020-02-01        5        5
SB     BC    2020-01-01       15       15
             2020-02-01        0        0

concatenate multiindex into single index in pandas series

Use map with join:

s.index = s.index.map('_'.join)

Alternative is list comprehension:

s.index = ['{}_{}'.format(i, j) for i, j in s.index]

print (s)
one_a    1.0
one_b    2.0
two_a    3.0
two_b    4.0
dtype: float64

Modifying a subset of a pandas MultiIndex

One way is to re-assign the index by pd.MultiIndex:

idx_to_change = {(0, 10), (9, 25)}

data.index = pd.MultiIndex.from_tuples([i if i not in idx_to_change else (i[0],i[1]+10) for i in data.index], names=("start","end"))
print (data)

          col1  col2
start end           
0     20     a     1
12    20     b     1
9     35     a     2
24    32     d     2