Python Pandas: drop a column from a multi-level column index?
Solved:
df.drop('c', axis=1, level=1)
Drop even levels from multi-index dataset
You can pass a list of index levels to DataFrame.droplevel
.
For instance, given the following DataFrame
import pandas as pd
df = (
pd.DataFrame(np.random.randint(5, size=(5,5)),
columns=list('abcde'))
.set_index(list('abcd'))
)
>>> df
e
a b c d
0 4 2 0 2
3 2 3 1 1
4 2 2 3 4
0 0 1 4 2
4 3 4 4
You can do something like
res = df.droplevel(list(range(1, len(df.index.names), 2)))
>>> res
e
a c
0 2 2
3 3 1
4 2 4
0 1 2
3 4
Removing columns selectively from multilevel index dataframe
Seems drop
doesn't support selection over split levels ([0,2]
here). We can create a mask with the conditions instead using get_level_values
:
# keep where not ((level0 is 'data1') and (level2 is 'E'))
col_mask = ~((df.columns.get_level_values(0) == 'data1')
& (df.columns.get_level_values(2) == 'E'))
df = df.loc[:, col_mask]
We can also do this by integer location by excluding the locs that are in a particular index slice, however, this is overall less clear and less flexible:
idx = pd.IndexSlice['data1', :, 'E']
cols = [i for i in range(len(df.columns))
if i not in df.columns.get_locs(idx)]
df = df.iloc[:, cols]
Either approach produces df
:
meter data1 data2
Sleeper F K X
sweeper C D E
A 2 3 5
B 6 7 9
C 10 11 13
How to remove levels from a multi-indexed dataframe?
df.reset_index(level=2, drop=True)
Out[29]:
A
1 1 8
3 9
How do I add a multi-level column index to an existing df?
You can manually construct a pandas.MultiIndex
using one of several constructors. From the docs for your case:
MultiIndex.from_arrays
Convert list of arrays to MultiIndex.MultiIndex.from_tuples
Convert list of tuples to a MultiIndex.MultiIndex.from_frame
Make a MultiIndex from a DataFrame.
For your case, I think pd.MultiIndex.from_arrays
might be the easiest way:
df.columns=pd.MultiIndex.from_arrays([['H','H'],['Cat1','Cat2'],df.columns],names=['Importance','Category',''])
output:
Importance| H | H |
Category | Cat1 | Cat2 |
|Total Assets | AUMs |
Firm 1 | 100 | 300 |
Firm 2 | 200 | 3400 |
Firm 3 | 300 | 800 |
Firm 4 | NaN | 800 |
Drop rows where multi-index is some number
You could groupby
the first index level and filter
the groups whith length greater than 1:
df.groupby(level=0).filter(lambda g: len(g)>1)
output:
pid ts
sid vid
1 A page1 t1
A page2 t2
A page3 t3
NB. you could also use the level name: df.groupby(level='sid').filter(lambda g: len(g)>1)
used input:
df = (pd.DataFrame({'pid': {(1, 'A'): 'page3', (2, 'B'): 'page1', (3, 'C'): 'page1'},
'ts': {(1, 'A'): 't3', (2, 'B'): 't4', (3, 'C'): 't5'}})
.rename_axis(['sid', 'vid'])
)
# pid ts
# sid vid
# 1 A page3 t3
# 2 B page1 t4
# 3 C page1 t5
groupby with multi level column index python
I would recommend restructuring d1
a bit first...
d1 = d1.set_index([('id','-'),('group','-')]).stack([0,1]).reset_index()
d1.columns = ['id','group','level_1','level_2','category']
id group level_1 level_2 category
0 i1 a g1 1 dog
1 i1 a g1 2 mouse
2 i1 a g2 1 cat
3 i1 a g2 2 mouse
4 i2 a g1 1 cat
5 i2 a g1 2 mouse
6 i2 a g2 1 dog
7 i2 a g2 2 dog
8 i3 a g1 1 dog
9 i3 a g1 2 dog
10 i3 a g2 1 cat
11 i3 a g2 2 dog
12 i4 b g1 1 cat
13 i4 b g1 2 dog
14 i4 b g2 1 dog
15 i4 b g2 2 cat
...and then using either pivot_table or groupby (result is the same)...
# pivot_table
d2 = pd.pivot_table(d1, index=['group', 'category'], columns=['level_1','level_2'], aggfunc='count', fill_value=0).droplevel(0, axis=1).rename_axis([None,None], axis=1)
# groupby
d2 = d1.groupby(['group','category','level_1','level_2'])['id'].count().unstack(['level_1','level_2'], fill_value=0).rename_axis([None,None], axis=1).sort_index(axis=1)
g1 g2
1 2 1 2
group category
a cat 1 0 2 0
dog 2 1 1 2
mouse 0 2 0 1
b cat 1 0 0 1
dog 0 1 1 0
PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance. How to get rid of it?
Let's try with an example (without data for simplicity):
# Column MultiIndex.
idx = pd.MultiIndex(levels=[['Col1', 'Col2', 'Col3'], ['subcol1', 'subcol2']],
codes=[[2, 1, 0], [0, 1, 1]])
df = pd.DataFrame(columns=range(len(idx)))
df.columns = idx
print(df)
Col3 Col2 Col1
subcol1 subcol2 subcol2
Clearly, the column MultiIndex
is not sorted. We can check it with:
print(df.columns.is_monotonic)
False
This matters because Pandas performs index lookup and other operations much faster if the index is sorted, because it can use operations that assume the sorted order and are faster. Indeed, if we try to drop a column:
df.drop('Col1', axis=1)
PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
df.drop('Col1', axis=1)
Instead, if we sort the index before dropping, the warning disappears:
print(df.sort_index(axis=1))
# Index is now sorted in lexical order.
Col1 Col2 Col3
subcol2 subcol2 subcol1
# No warning here.
df.sort_index(axis=1).drop('Col1', axis=1)
EDIT (see comments): As the warning suggests, this happens when we do not specify the level from which we want to drop the column. This is because to drop the column, pandas has to traverse the whole non-sorted index (happens here). By specifying it we do not need such traversal:
# Also no warning.
df.drop('Col1', axis=1, level=0)
However, in general this problem relates more on row indices, as usually column multi-indices are way smaller. But definitely to keep it in mind for larger indices and dataframes. In fact, this is in particular relevant for slicing by index and for lookups. In those cases, you want your index to be sorted for better performance.
Related Topics
How to Open a File for Both Reading and Writing
Does Anybody Know How to Identify Shadow Dom Web Elements Using Selenium Webdriver
How to Search Directories and Find Files That Match Regex
R Expand.Grid() Function in Python
Show Default Value for Editing on Python Input Possible
Why Isn't My Pandas 'Apply' Function Referencing Multiple Columns Working
How to Get the Input from the Tkinter Text Widget
Quick and Easy File Dialog in Python
What Is the Fastest Way to Flatten Arbitrarily Nested Lists in Python
Differences Between Staticfiles_Dir, Static_Root and Media_Root
How to Use the Python HTMLparser Library to Extract Data from a Specific Div Tag
Why Does Pyimport_Import Fail to Load a Module from the Current Directory
Creating Same Random Number Sequence in Python, Numpy and R
How to Postpone/Defer the Evaluation of F-Strings
Access an Arbitrary Element in a Dictionary in Python