Pandas: Drop a Level from a Multi-Level Column Index

Python Pandas: drop a column from a multi-level column index?

Solved:

df.drop('c', axis=1, level=1)

Drop even levels from multi-index dataset

You can pass a list of index levels to DataFrame.droplevel.

For instance, given the following DataFrame

import pandas as pd

df = (
    pd.DataFrame(np.random.randint(5, size=(5,5)), 
                 columns=list('abcde'))
      .set_index(list('abcd'))
)

You can do something like

res = df.droplevel(list(range(1, len(df.index.names), 2)))

Removing columns selectively from multilevel index dataframe

Seems drop doesn't support selection over split levels ([0,2] here). We can create a mask with the conditions instead using get_level_values:

# keep where not ((level0 is 'data1') and (level2 is 'E'))
col_mask = ~((df.columns.get_level_values(0) == 'data1')
             & (df.columns.get_level_values(2) == 'E'))
df = df.loc[:, col_mask]

We can also do this by integer location by excluding the locs that are in a particular index slice, however, this is overall less clear and less flexible:

idx = pd.IndexSlice['data1', :, 'E']
cols = [i for i in range(len(df.columns))
        if i not in df.columns.get_locs(idx)]
df = df.iloc[:, cols]

Either approach produces df:

meter   data1 data2    
Sleeper     F     K   X
sweeper     C     D   E
A           2     3   5
B           6     7   9
C          10    11  13

How to remove levels from a multi-indexed dataframe?

df.reset_index(level=2, drop=True)
Out[29]: 
     A
1 1  8
  3  9

How do I add a multi-level column index to an existing df?

You can manually construct a pandas.MultiIndex using one of several constructors. From the docs for your case:

MultiIndex.from_arrays

Convert list of arrays to MultiIndex.
MultiIndex.from_tuples

Convert list of tuples to a MultiIndex.
MultiIndex.from_frame

Make a MultiIndex from a DataFrame.

For your case, I think pd.MultiIndex.from_arrays might be the easiest way:

df.columns=pd.MultiIndex.from_arrays([['H','H'],['Cat1','Cat2'],df.columns],names=['Importance','Category',''])

output:

Importance| H           | H      |
Category | Cat1         | Cat2   |
         |Total Assets  | AUMs   | 
Firm 1   | 100          |  300   |  
Firm 2   | 200          | 3400   |  
Firm 3   | 300          | 800    | 
Firm 4   | NaN          | 800    |

Drop rows where multi-index is some number

You could groupby the first index level and filter the groups whith length greater than 1:

df.groupby(level=0).filter(lambda g: len(g)>1)

output:

           pid  ts
sid vid           
1   A    page1  t1
    A    page2  t2
    A    page3  t3

NB. you could also use the level name: df.groupby(level='sid').filter(lambda g: len(g)>1)

used input:

df = (pd.DataFrame({'pid': {(1, 'A'): 'page3', (2, 'B'): 'page1', (3, 'C'): 'page1'},
                    'ts': {(1, 'A'): 't3', (2, 'B'): 't4', (3, 'C'): 't5'}})
        .rename_axis(['sid', 'vid'])
     )

#            pid  ts
# sid vid           
# 1   A    page3  t3
# 2   B    page1  t4
# 3   C    page1  t5

groupby with multi level column index python

I would recommend restructuring d1 a bit first...

d1 = d1.set_index([('id','-'),('group','-')]).stack([0,1]).reset_index()
d1.columns = ['id','group','level_1','level_2','category']

    id group level_1 level_2 category
0   i1     a      g1       1      dog
1   i1     a      g1       2    mouse
2   i1     a      g2       1      cat
3   i1     a      g2       2    mouse
4   i2     a      g1       1      cat
5   i2     a      g1       2    mouse
6   i2     a      g2       1      dog
7   i2     a      g2       2      dog
8   i3     a      g1       1      dog
9   i3     a      g1       2      dog
10  i3     a      g2       1      cat
11  i3     a      g2       2      dog
12  i4     b      g1       1      cat
13  i4     b      g1       2      dog
14  i4     b      g2       1      dog
15  i4     b      g2       2      cat

...and then using either pivot_table or groupby (result is the same)...

# pivot_table
d2 = pd.pivot_table(d1, index=['group', 'category'], columns=['level_1','level_2'], aggfunc='count', fill_value=0).droplevel(0, axis=1).rename_axis([None,None], axis=1)

# groupby
d2 = d1.groupby(['group','category','level_1','level_2'])['id'].count().unstack(['level_1','level_2'], fill_value=0).rename_axis([None,None], axis=1).sort_index(axis=1)

               g1    g2   
                1  2  1  2
group category            
a     cat       1  0  2  0
      dog       2  1  1  2
      mouse     0  2  0  1
b     cat       1  0  0  1
      dog       0  1  1  0

PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance. How to get rid of it?

Let's try with an example (without data for simplicity):

# Column MultiIndex.
idx = pd.MultiIndex(levels=[['Col1', 'Col2', 'Col3'], ['subcol1', 'subcol2']], 
                    codes=[[2, 1, 0], [0, 1, 1]])

df = pd.DataFrame(columns=range(len(idx)))
df.columns = idx
print(df)

    Col3    Col2    Col1
subcol1 subcol2 subcol2

Clearly, the column MultiIndex is not sorted. We can check it with:

print(df.columns.is_monotonic)

False

This matters because Pandas performs index lookup and other operations much faster if the index is sorted, because it can use operations that assume the sorted order and are faster. Indeed, if we try to drop a column:

df.drop('Col1', axis=1)

PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  df.drop('Col1', axis=1)

Instead, if we sort the index before dropping, the warning disappears:

print(df.sort_index(axis=1))

# Index is now sorted in lexical order.
    Col1    Col2    Col3
subcol2 subcol2 subcol1

# No warning here.
df.sort_index(axis=1).drop('Col1', axis=1)

EDIT (see comments): As the warning suggests, this happens when we do not specify the level from which we want to drop the column. This is because to drop the column, pandas has to traverse the whole non-sorted index (happens here). By specifying it we do not need such traversal:

# Also no warning.
df.drop('Col1', axis=1, level=0)

However, in general this problem relates more on row indices, as usually column multi-indices are way smaller. But definitely to keep it in mind for larger indices and dataframes. In fact, this is in particular relevant for slicing by index and for lookups. In those cases, you want your index to be sorted for better performance.