Pandas: Drop a Level from a Multi-Level Column Index

Python Pandas: drop a column from a multi-level column index?

Solved:

df.drop('c', axis=1, level=1)

Drop even levels from multi-index dataset

You can pass a list of index levels to DataFrame.droplevel.

For instance, given the following DataFrame

import pandas as pd

df = (
pd.DataFrame(np.random.randint(5, size=(5,5)),
columns=list('abcde'))
.set_index(list('abcd'))
)
>>> df

e
a b c d
0 4 2 0 2
3 2 3 1 1
4 2 2 3 4
0 0 1 4 2
4 3 4 4

You can do something like

res = df.droplevel(list(range(1, len(df.index.names), 2)))
>>> res

e
a c
0 2 2
3 3 1
4 2 4
0 1 2
3 4

Removing columns selectively from multilevel index dataframe

Seems drop doesn't support selection over split levels ([0,2] here). We can create a mask with the conditions instead using get_level_values:

# keep where not ((level0 is 'data1') and (level2 is 'E'))
col_mask = ~((df.columns.get_level_values(0) == 'data1')
& (df.columns.get_level_values(2) == 'E'))
df = df.loc[:, col_mask]

We can also do this by integer location by excluding the locs that are in a particular index slice, however, this is overall less clear and less flexible:

idx = pd.IndexSlice['data1', :, 'E']
cols = [i for i in range(len(df.columns))
if i not in df.columns.get_locs(idx)]
df = df.iloc[:, cols]

Either approach produces df:

meter   data1 data2    
Sleeper F K X
sweeper C D E
A 2 3 5
B 6 7 9
C 10 11 13

How to remove levels from a multi-indexed dataframe?

df.reset_index(level=2, drop=True)
Out[29]:
A
1 1 8
3 9

How do I add a multi-level column index to an existing df?

You can manually construct a pandas.MultiIndex using one of several constructors. From the docs for your case:

  • MultiIndex.from_arrays

    Convert list of arrays to MultiIndex.

  • MultiIndex.from_tuples

    Convert list of tuples to a MultiIndex.

  • MultiIndex.from_frame

    Make a MultiIndex from a DataFrame.

For your case, I think pd.MultiIndex.from_arrays might be the easiest way:

df.columns=pd.MultiIndex.from_arrays([['H','H'],['Cat1','Cat2'],df.columns],names=['Importance','Category',''])

output:

Importance| H           | H      |
Category | Cat1 | Cat2 |
|Total Assets | AUMs |
Firm 1 | 100 | 300 |
Firm 2 | 200 | 3400 |
Firm 3 | 300 | 800 |
Firm 4 | NaN | 800 |

Drop rows where multi-index is some number

You could groupby the first index level and filter the groups whith length greater than 1:

df.groupby(level=0).filter(lambda g: len(g)>1)

output:

           pid  ts
sid vid
1 A page1 t1
A page2 t2
A page3 t3

NB. you could also use the level name: df.groupby(level='sid').filter(lambda g: len(g)>1)

used input:

df = (pd.DataFrame({'pid': {(1, 'A'): 'page3', (2, 'B'): 'page1', (3, 'C'): 'page1'},
'ts': {(1, 'A'): 't3', (2, 'B'): 't4', (3, 'C'): 't5'}})
.rename_axis(['sid', 'vid'])
)

# pid ts
# sid vid
# 1 A page3 t3
# 2 B page1 t4
# 3 C page1 t5

groupby with multi level column index python

I would recommend restructuring d1 a bit first...

d1 = d1.set_index([('id','-'),('group','-')]).stack([0,1]).reset_index()
d1.columns = ['id','group','level_1','level_2','category']

id group level_1 level_2 category
0 i1 a g1 1 dog
1 i1 a g1 2 mouse
2 i1 a g2 1 cat
3 i1 a g2 2 mouse
4 i2 a g1 1 cat
5 i2 a g1 2 mouse
6 i2 a g2 1 dog
7 i2 a g2 2 dog
8 i3 a g1 1 dog
9 i3 a g1 2 dog
10 i3 a g2 1 cat
11 i3 a g2 2 dog
12 i4 b g1 1 cat
13 i4 b g1 2 dog
14 i4 b g2 1 dog
15 i4 b g2 2 cat

...and then using either pivot_table or groupby (result is the same)...

# pivot_table
d2 = pd.pivot_table(d1, index=['group', 'category'], columns=['level_1','level_2'], aggfunc='count', fill_value=0).droplevel(0, axis=1).rename_axis([None,None], axis=1)

# groupby
d2 = d1.groupby(['group','category','level_1','level_2'])['id'].count().unstack(['level_1','level_2'], fill_value=0).rename_axis([None,None], axis=1).sort_index(axis=1)

g1 g2
1 2 1 2
group category
a cat 1 0 2 0
dog 2 1 1 2
mouse 0 2 0 1
b cat 1 0 0 1
dog 0 1 1 0

PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance. How to get rid of it?

Let's try with an example (without data for simplicity):

# Column MultiIndex.
idx = pd.MultiIndex(levels=[['Col1', 'Col2', 'Col3'], ['subcol1', 'subcol2']],
codes=[[2, 1, 0], [0, 1, 1]])

df = pd.DataFrame(columns=range(len(idx)))
df.columns = idx
print(df)
    Col3    Col2    Col1
subcol1 subcol2 subcol2

Clearly, the column MultiIndex is not sorted. We can check it with:

print(df.columns.is_monotonic)
False

This matters because Pandas performs index lookup and other operations much faster if the index is sorted, because it can use operations that assume the sorted order and are faster. Indeed, if we try to drop a column:

df.drop('Col1', axis=1)
PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
df.drop('Col1', axis=1)

Instead, if we sort the index before dropping, the warning disappears:

print(df.sort_index(axis=1))

# Index is now sorted in lexical order.
Col1 Col2 Col3
subcol2 subcol2 subcol1
# No warning here.
df.sort_index(axis=1).drop('Col1', axis=1)

EDIT (see comments): As the warning suggests, this happens when we do not specify the level from which we want to drop the column. This is because to drop the column, pandas has to traverse the whole non-sorted index (happens here). By specifying it we do not need such traversal:

# Also no warning.
df.drop('Col1', axis=1, level=0)

However, in general this problem relates more on row indices, as usually column multi-indices are way smaller. But definitely to keep it in mind for larger indices and dataframes. In fact, this is in particular relevant for slicing by index and for lookups. In those cases, you want your index to be sorted for better performance.



Related Topics



Leave a reply



Submit