Pandas Select Rows and Columns Based on Boolean Condition

Pandas select rows and columns based on boolean condition

Let's break down your problem. You want to

  1. Filter rows based on some boolean condition
  2. You want to select a subset of columns from the result.

For the first point, the condition you'd need is -

df["col_z"] < m

For the second requirement, you'd want to specify the list of columns that you need -

["col_x", "col_y"]

How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc -

df.loc[df["col_z"] < m, ["col_x", "col_y"]]

The first argument selects rows, and the second argument selects columns.


More About loc

Think of this in terms of the relational algebra operations - selection and projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -

SELECT col_x, col_y     # projection on columns
FROM df
WHERE col_z < m # selection on rows

pandas loc allows you to specify index labels for selecting rows. For example, if you have a dataframe -

   col_x  col_y
a 1 4
b 2 5
c 3 6

To select index a, and c, and col_x you'd use -

df.loc[['a', 'c'], ['col_x']]

col_x
a 1
c 3

Alternatively, for selecting by a boolean condition (using a series/array of bool values, as your original question asks), where all values in col_x are odd -

df.loc[(df.col_x % 2).ne(0), ['col_y']]

col_y
a 4
c 6

For details, df.col_x % 2 computes the modulus of each value with respect to 2. The ne(0) will then compare the value to 0, and return True if it isn't (all odd numbers are selected like this). Here's what that expression results in -

(df.col_x % 2).ne(0)

a True
b False
c True
Name: col_x, dtype: bool

Further Reading

  • 10 Minutes to Pandas - Selection by Label
  • Indexing and selecting data
    • Boolean indexing
  • Selection with .loc in python
  • pandas loc vs. iloc vs. ix vs. at vs. iat?

Subsetting Pandas dataframe based on Boolean condition - why doesn't order matter?

The df["A"]=="value" part of your code returns a pandas Series containing Boolean values in accordance to the condition ("A" == "value").
By puting a series mask (a filter, basically) on your DataFrame returns a DataFrame containining only the values on the rows where you've had True in your Series mask.
So, in your first code ( df[df["A"]=="value"]["B"] ), you are applying the specific mask on the DataFrame, obtaining only the rows where the column "A" was equal to "value", then you are extracting the "B" column from your DataFrame.
In your second code, you are first selecting the column "B", then you are selecting only the rows where the column "A" == "value" in the initial DataFrame.
Hope this helps!

Pandas Select DataFrame columns using boolean

What is returned is a Series with the column names as the index and the boolean values as the row values.

I think actually you want:

this should now work:

comb[criteria.index[criteria]]

Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.

Filtering pandas dataframe rows based on boolean columns

Use DataFrame.any:

df1 = df[df.any(axis=1)]

Out of box:

df1 = df[df.sum(axis=1).gt(0)]

Pandas DataFrame Group-by select column based on boolean condition

You can filter first and then group and count:

df[df['col3']==20].groupby('col1')['col2'].count()

select rows that are equal in a column based on another Boolean column

Let us transform Bool with any per ID and Animal

df['Same'] = ~df['Bool'] & df.groupby(['ID', 'Animal'])['Bool'].transform('any')


   ID  Animal   Bool   Same
0 1 cat True False
1 1 bat False False
2 1 cat False True
3 1 bat False False
4 2 monkey True False
5 2 monkey False True
6 2 bird False False
7 2 bird False False

Select rows of data frame based on true false boolean list

Problem is condition and filtered DataFrame has different index values:

#condition has index from dfmappe
mask = (dfmappe[['CryptIDs']].isin(df[['CryptID']])).all(axis=1)
#filtered df - both DataFrames has different indices, so raise error
dffound = df[mask]

Possible solutions - because tested one columns is removed [[]] and all(axis=1):

mask = dfmappe['CryptIDs'].isin(df['CryptID'])
#filtered dfmappe
dffound = dfmappe[mask]

Or:

#mask test df by dfmappe columns
mask = df['CryptIDs'].isin(dfmappe['CryptID'])
#filtered df
dffound = df[mask]

pandas - select rows where the boolean filtering of a subset of columns are true

use pandas.DataFrame.all:

df[mask_df.all(axis = 1)]


Related Topics



Leave a reply



Submit