Selecting Columns from Pandas Multiindex

pandas dataframe select columns in multiindex

There is a get_level_values method that you can use in conjunction with boolean indexing to get the the intended result.

In [13]:

df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
print df
1 2
A B A B
0 0.543980 0.628078 0.756941 0.698824
1 0.633005 0.089604 0.198510 0.783556
2 0.662391 0.541182 0.544060 0.059381
3 0.841242 0.634603 0.815334 0.848120
In [14]:

print df.iloc[:, df.columns.get_level_values(1)=='A']
1 2
A A
0 0.543980 0.756941
1 0.633005 0.198510
2 0.662391 0.544060
3 0.841242 0.815334

Selecting columns from pandas MultiIndex

It's not great, but maybe:

>>> data
one two
a b c a b c
0 -0.927134 -1.204302 0.711426 0.854065 -0.608661 1.140052
1 -0.690745 0.517359 -0.631856 0.178464 -0.312543 -0.418541
2 1.086432 0.194193 0.808235 -0.418109 1.055057 1.886883
3 -0.373822 -0.012812 1.329105 1.774723 -2.229428 -0.617690
>>> data.loc[:,data.columns.get_level_values(1).isin({"a", "c"})]
one two
a c a c
0 -0.927134 0.711426 0.854065 1.140052
1 -0.690745 -0.631856 0.178464 -0.418541
2 1.086432 0.808235 -0.418109 1.886883
3 -0.373822 1.329105 1.774723 -0.617690

would work?

pandas MultiIndex on columns select columns from level 0 (outside) as well as level 1 (inside)

Just another way to skin the cat:

print (df["bar"].filter(like="%"))

25% 50% 75%
dt group
2020-01-01 a 1.0 1.0 1.0
2020-01-02 a 2.0 2.0 2.0
2020-01-03 b 3.0 3.5 4.0

Pandas Multi Index on columns: how to select all columns by part of string on column name

You can build a boolean indexing mask using Index.get_level_values and str.contains:

lvl = 'field_name'
s = "escala de 0-10"

df.loc[:, df.columns.get_level_values(lvl).str.contains(s)]

Pandas MultiIndex: Selecting a column knowing only the second index?

You pass slice(None) as the first argument to .loc, provided you sort your columns first using df.sort_index:

In [325]: df.sort_index(1).loc[:, (slice(None), 'RHS')]
Out[325]:
age
RHS
0 8.0
1 8.0
2 6.0
3 5.0
4 5.0
5 3.0

You can also use pd.IndexSlice with df.loc:

In [332]: idx = pd.IndexSlice

In [333]: df.sort_index(1).loc[:, idx[:, 'RHS']]
Out[333]:
age
RHS
0 8.0
1 8.0
2 6.0
3 5.0
4 5.0
5 3.0

With the slicer, you don't need to explicitly pass slice(None) because IndexSlice does that for you.


If you don't sort your columns, you get:

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

If you have multiple RHS columns in the second level, all those columns are returned.

Pandas multiindex: select by condition

Use tuples in DataFrame.loc:

a = df.loc[df[("cat2", "vals2")] == 7, ('cat1', 'vals1')]
print (a)

Last if need scalar from one element Series:

out = a.iat[0]

If possible no match:

out = next(iter(a), 'no match')

You can compare sliced rows, columns - output is DataFrame filled by boolean - so for boolean Series need test if any True per row by DataFrame.any or all Trues per rows by DataFrame.all:

m = df.loc[:, ("cat2", slice(None))]==7

a = df.loc[m.any(axis=1), ("cat1", "vals2")]
print (a)
a d 5
Name: (cat1, vals2), dtype: int32


m = df.loc[:, ("cat2", slice(None))]==7
df2 = df.loc[m.any(axis=1), ("cat1", slice(None))]
print (df2)
cat1
vals1 vals2
a d 4 5

pandas multiindex - how to select second level when using columns?

Also using John's data sample:

Using xs() is another way to slice a MultiIndex:

df
0
stock1 price 1
volume 2
stock2 price 3
volume 4
stock3 price 5
volume 6

df.xs('price', level=1, drop_level=False)
0
stock1 price 1
stock2 price 3
stock3 price 5

Alternatively if you have a MultiIndex in place of columns:

df
stock1 stock2 stock3
price volume price volume price volume
0 1 2 3 4 5 6

df.xs('price', axis=1, level=1, drop_level=False)
stock1 stock2 stock3
price price price
0 1 3 5

selecting from multi-index pandas

One way is to use the get_level_values Index method:

In [11]: df
Out[11]:
0
A B
1 4 1
2 5 2
3 6 3

In [12]: df.iloc[df.index.get_level_values('A') == 1]
Out[12]:
0
A B
1 4 1

In 0.13 you'll be able to use xs with drop_level argument:

df.xs(1, level='A', drop_level=False) # axis=1 if columns

Note: if this were column MultiIndex rather than index, you could use the same technique:

In [21]: df1 = df.T

In [22]: df1.iloc[:, df1.columns.get_level_values('A') == 1]
Out[22]:
A 1
B 4
0 1


Related Topics



Leave a reply



Submit