Pandas select rows and columns based on boolean condition
Let's break down your problem. You want to
- Filter rows based on some boolean condition
- You want to select a subset of columns from the result.
For the first point, the condition you'd need is -
df["col_z"] < m
For the second requirement, you'd want to specify the list of columns that you need -
["col_x", "col_y"]
How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc
-
df.loc[df["col_z"] < m, ["col_x", "col_y"]]
The first argument selects rows, and the second argument selects columns.
More About loc
Think of this in terms of the relational algebra operations - selection and projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -
SELECT col_x, col_y # projection on columns
FROM df
WHERE col_z < m # selection on rows
pandas
loc allows you to specify index labels for selecting rows. For example, if you have a dataframe -
col_x col_y
a 1 4
b 2 5
c 3 6
To select index a
, and c
, and col_x
you'd use -
df.loc[['a', 'c'], ['col_x']]
col_x
a 1
c 3
Alternatively, for selecting by a boolean condition (using a series/array of bool
values, as your original question asks), where all values in col_x
are odd -
df.loc[(df.col_x % 2).ne(0), ['col_y']]
col_y
a 4
c 6
For details, df.col_x % 2
computes the modulus of each value with respect to 2
. The ne(0)
will then compare the value to 0
, and return True
if it isn't (all odd numbers are selected like this). Here's what that expression results in -
(df.col_x % 2).ne(0)
a True
b False
c True
Name: col_x, dtype: bool
Further Reading
- 10 Minutes to Pandas - Selection by Label
- Indexing and selecting data
- Boolean indexing
- Selection with .loc in python
- pandas loc vs. iloc vs. ix vs. at vs. iat?
Subsetting Pandas dataframe based on Boolean condition - why doesn't order matter?
The df["A"]=="value" part of your code returns a pandas Series containing Boolean values in accordance to the condition ("A" == "value").
By puting a series mask (a filter, basically) on your DataFrame returns a DataFrame containining only the values on the rows where you've had True in your Series mask.
So, in your first code ( df[df["A"]=="value"]["B"] ), you are applying the specific mask on the DataFrame, obtaining only the rows where the column "A" was equal to "value", then you are extracting the "B" column from your DataFrame.
In your second code, you are first selecting the column "B", then you are selecting only the rows where the column "A" == "value" in the initial DataFrame.
Hope this helps!
Pandas Select DataFrame columns using boolean
What is returned is a Series with the column names as the index and the boolean values as the row values.
I think actually you want:
this should now work:
comb[criteria.index[criteria]]
Basically this uses the index values from criteria and the boolean values to mask them, this will return an array of column names, we can use this to select the columns of interest from the orig df.
Filtering pandas dataframe rows based on boolean columns
Use DataFrame.any
:
df1 = df[df.any(axis=1)]
Out of box:
df1 = df[df.sum(axis=1).gt(0)]
Pandas DataFrame Group-by select column based on boolean condition
You can filter first and then group and count:
df[df['col3']==20].groupby('col1')['col2'].count()
select rows that are equal in a column based on another Boolean column
Let us transform
Bool
with any
per ID
and Animal
df['Same'] = ~df['Bool'] & df.groupby(['ID', 'Animal'])['Bool'].transform('any')
ID Animal Bool Same
0 1 cat True False
1 1 bat False False
2 1 cat False True
3 1 bat False False
4 2 monkey True False
5 2 monkey False True
6 2 bird False False
7 2 bird False False
Select rows of data frame based on true false boolean list
Problem is condition and filtered DataFrame has different index values:
#condition has index from dfmappe
mask = (dfmappe[['CryptIDs']].isin(df[['CryptID']])).all(axis=1)
#filtered df - both DataFrames has different indices, so raise error
dffound = df[mask]
Possible solutions - because tested one columns is removed [[]]
and all(axis=1)
:
mask = dfmappe['CryptIDs'].isin(df['CryptID'])
#filtered dfmappe
dffound = dfmappe[mask]
Or:
#mask test df by dfmappe columns
mask = df['CryptIDs'].isin(dfmappe['CryptID'])
#filtered df
dffound = df[mask]
pandas - select rows where the boolean filtering of a subset of columns are true
use pandas.DataFrame.all
:
df[mask_df.all(axis = 1)]
Related Topics
Generating File to Download with Django
Editing Workbooks with Rich Text in Openpyxl
Why Do Attribute References Act Like This with Python Inheritance
Failed to Upload Packages to Pypi: 410 Gone
Get an Attribute Value Based on the Name Attribute with Beautifulsoup
Passing Csrftoken with Python Requests
Typeerror: List Indices Must Be Integers or Slices, Not Str
Sphinx's Autodoc's Automodule Having Apparently No Effect
Converting List of Tuples into a Dictionary
Types That Define '_Eq_' Are Unhashable
How to Write Tests for the Argparse Portion of a Python Module
Web Scraping Program Cannot Find Element Which I Can See in the Browser
How to Check If All Items in a List Are There in Another List