pandas dataframe select columns in multiindex
There is a get_level_values
method that you can use in conjunction with boolean indexing to get the the intended result.
In [13]:
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
print df
1 2
A B A B
0 0.543980 0.628078 0.756941 0.698824
1 0.633005 0.089604 0.198510 0.783556
2 0.662391 0.541182 0.544060 0.059381
3 0.841242 0.634603 0.815334 0.848120
In [14]:
print df.iloc[:, df.columns.get_level_values(1)=='A']
1 2
A A
0 0.543980 0.756941
1 0.633005 0.198510
2 0.662391 0.544060
3 0.841242 0.815334
Selecting columns from pandas MultiIndex
It's not great, but maybe:
>>> data
one two
a b c a b c
0 -0.927134 -1.204302 0.711426 0.854065 -0.608661 1.140052
1 -0.690745 0.517359 -0.631856 0.178464 -0.312543 -0.418541
2 1.086432 0.194193 0.808235 -0.418109 1.055057 1.886883
3 -0.373822 -0.012812 1.329105 1.774723 -2.229428 -0.617690
>>> data.loc[:,data.columns.get_level_values(1).isin({"a", "c"})]
one two
a c a c
0 -0.927134 0.711426 0.854065 1.140052
1 -0.690745 -0.631856 0.178464 -0.418541
2 1.086432 0.808235 -0.418109 1.886883
3 -0.373822 1.329105 1.774723 -0.617690
would work? pandas MultiIndex on columns select columns from level 0 (outside) as well as level 1 (inside)
Just another way to skin the cat:
print (df["bar"].filter(like="%"))
25% 50% 75%
dt group
2020-01-01 a 1.0 1.0 1.0
2020-01-02 a 2.0 2.0 2.0
2020-01-03 b 3.0 3.5 4.0
Pandas Multi Index on columns: how to select all columns by part of string on column name
You can build a boolean indexing mask using Index.get_level_values
and str.contains
:
lvl = 'field_name'
s = "escala de 0-10"
df.loc[:, df.columns.get_level_values(lvl).str.contains(s)]
Pandas MultiIndex: Selecting a column knowing only the second index?
You pass slice(None)
as the first argument to .loc
, provided you sort your columns first using df.sort_index
:
In [325]: df.sort_index(1).loc[:, (slice(None), 'RHS')]
Out[325]:
age
RHS
0 8.0
1 8.0
2 6.0
3 5.0
4 5.0
5 3.0
You can also use pd.IndexSlice
with df.loc
:In [332]: idx = pd.IndexSlice
In [333]: df.sort_index(1).loc[:, idx[:, 'RHS']]
Out[333]:
age
RHS
0 8.0
1 8.0
2 6.0
3 5.0
4 5.0
5 3.0
With the slicer, you don't need to explicitly pass slice(None)
because IndexSlice
does that for you.If you don't sort your columns, you get:
UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'
If you have multiple RHS
columns in the second level, all those columns are returned. Pandas multiindex: select by condition
Use tuples in DataFrame.loc
:
a = df.loc[df[("cat2", "vals2")] == 7, ('cat1', 'vals1')]
print (a)
Last if need scalar from one element Series
:out = a.iat[0]
If possible no match:out = next(iter(a), 'no match')
You can compare sliced rows, columns - output is DataFrame filled by boolean - so for boolean Series
need test if any True per row by DataFrame.any
or all Trues per rows by DataFrame.all
:m = df.loc[:, ("cat2", slice(None))]==7
a = df.loc[m.any(axis=1), ("cat1", "vals2")]
print (a)
a d 5
Name: (cat1, vals2), dtype: int32
m = df.loc[:, ("cat2", slice(None))]==7
df2 = df.loc[m.any(axis=1), ("cat1", slice(None))]
print (df2)
cat1
vals1 vals2
a d 4 5
pandas multiindex - how to select second level when using columns?
Also using John's data sample:
Using xs()
is another way to slice a MultiIndex
:
df
0
stock1 price 1
volume 2
stock2 price 3
volume 4
stock3 price 5
volume 6
df.xs('price', level=1, drop_level=False)
0
stock1 price 1
stock2 price 3
stock3 price 5
Alternatively if you have a MultiIndex
in place of columns:df
stock1 stock2 stock3
price volume price volume price volume
0 1 2 3 4 5 6
df.xs('price', axis=1, level=1, drop_level=False)
stock1 stock2 stock3
price price price
0 1 3 5
selecting from multi-index pandas
One way is to use the get_level_values
Index method:
In [11]: df
Out[11]:
0
A B
1 4 1
2 5 2
3 6 3
In [12]: df.iloc[df.index.get_level_values('A') == 1]
Out[12]:
0
A B
1 4 1
In 0.13 you'll be able to use xs
with drop_level
argument:df.xs(1, level='A', drop_level=False) # axis=1 if columns
Note: if this were column MultiIndex rather than index, you could use the same technique:In [21]: df1 = df.T
In [22]: df1.iloc[:, df1.columns.get_level_values('A') == 1]
Out[22]:
A 1
B 4
0 1
Related Topics
Mapping a Range of Values to Another
Create a Custom Transformer in Pyspark Ml
How to Implement SQL Coalesce in Pandas
How to Get Millisecond and Microsecond-Resolution Timestamps in Python
How to Display Utf-8 in Windows Console
Why Results of Map() and List Comprehension Are Different
Spark Dataframe: Computing Row-Wise Mean (Or Any Aggregate Operation)
How to Change Any Data Type into a String
Bad Idea to Catch All Exceptions in Python
Pd.Timestamp Versus Np.Datetime64: Are They Interchangeable for Selected Uses
What Is the Relationship Between Google's App Engine Sdk and Cloud Sdk
Safely Create a File If and Only If It Does Not Exist with Python
Display Realtime Output of a Subprocess in a Tkinter Widget
Python Pip on Windows - Command 'Cl.Exe' Failed
Best Way to Make Django's Login_Required the Default