How to Select Rows According to Column Value Conditions

How do I select rows from a DataFrame based on column values?

To select rows whose column value equals a scalar, some_value, use ==:

df.loc[df['column_name'] == some_value]

To select rows whose column value is in an iterable, some_values, use isin:

df.loc[df['column_name'].isin(some_values)]

Combine multiple conditions with &:

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

df['column_name'] >= A & df['column_name'] <= B

is parsed as

df['column_name'] >= (A & df['column_name']) <= B

which results in a Truth value of a Series is ambiguous error.


To select rows whose column value does not equal some_value, use !=:

df.loc[df['column_name'] != some_value]

isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:

df.loc[~df['column_name'].isin(some_values)]

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14

If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use isin:

print(df.loc[df['B'].isin(['one','three'])])

yields

     A      B  C   D
0 foo one 0 0
1 bar one 1 2
3 bar three 3 6
6 foo one 6 12
7 foo three 7 14

Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use df.loc:

df = df.set_index(['B'])
print(df.loc['one'])

yields

       A  C   D
B
one foo 0 0
one bar 1 2
one foo 6 12

or, to include multiple values from the index use df.index.isin:

df.loc[df.index.isin(['one','two'])]

yields

       A  C   D
B
one foo 0 0
one bar 1 2
two foo 2 4
two foo 4 8
two bar 5 10
one foo 6 12

Select rows from a dataframe based on a condition and then assign a priority number to them in a new column

Is this what you are looking for?

df = data.frame("no_of_cases" = c(12,22,34), "grid_number" = c(454,345,67))

df %>% arrange(desc(no_of_cases)) %>% mutate("priority" = rank(-no_of_cases))

Pandas: how to select rows based on two conditions in the same column

What it sounds like you may be trying to do is see if ALL values in a group are only 'B' or 'E'. At the same time, your expected output has also excluded rows that meet that condition but only have one member of the group. You can groupby the "pair" columns you have mentioned and use list comprehension to check if all values are either D or E with all([True... ). I have also included an additional piece of logic and len(x) > 1, since your output excludes groups with only one row. This creates a boolean series s of True or False if the condition is met, which you can use to filter directly on the dataframe, and get the "expected output".

s = df.merge(df.groupby(['col1', 'col2'])['C'].apply(lambda x: all([True if y in ['D', 'E'] 
and len(x) > 1
else False for y in x ]))
.reset_index(),
how='left', on=['col1', 'col2']).iloc[:,-1]
df[s]
Out[1]:
col1 col2 C val
0 aaa rte_1 D 58
4 aaa rte_5 E 95
6 aaa rte_1 D 57
10 aaa rte_5 E 3

Pandas- Select rows from DataFrame based on condition

I think you need boolean indexing:

df1 = df[(df['category'] == 'A') & (df['value'].between(10,20))]
print (df1)
category value
2 A 15
4 A 18

And then:

df2 = df[(df['category'] != 'A') & (df['value'].between(10,20))]
print (df2)
category value
1 B 10

Or:

df3 = df[df['category'] != 'A']
print (df3)
category value
1 B 10
3 B 28

EDIT: Join both conditions with | for or, dont forget add () to first conditions.

df1 = df[((df['category'] == 'A') & (df['value'].between(10,20))) | 
(df['category'] != 'A')]
print (df1)
category value
1 B 10
2 A 15
3 B 28
4 A 18

How to select dataframe rows according to multi-(other column)-condition on columnar groups?

Here's one solution - groupby textId, then keep only those groups where the unique values of score is a superset (>=) of [1.0, 2.0, 3.0].

In [58]: df.groupby('textId').filter(lambda x: set(x['score']) >= set([1.,2.,3.]))
Out[58]:
textId score textInfo
3 name2 1.0 different_text_stuff
4 name2 1.3 different_text_stuff
5 name2 2.0 still_different_text
6 name2 1.0 yoko ono
7 name2 3.0 I lika da Gweneth

Pandas selecting rows with multiple conditions

You can use between. By default, it's both sides inclusive.

out = df[df['C'].between(0,1)]

If you want only one side inclusive, you can select that as well. For example, the following is only right-side inclusive:

out = df[df['C'].between(0,1, inclusive='right')]

Output:

          A         B         C
0 1.764052 0.400157 0.978738

how to select rows from a data frame using a condition from two columns in R

You can use

df[df$padj.co < 0.05 & df$padj.c2 >= 0.05, ]

I don't think I understand your second question without more background.

how to select rows based on two condition pandas

You can use:

df = df_1.merge(df_2, how='left', on='a')
print(df[df.b.isin(['Yes', np.nan])][['a']])

OUTPUT

   a
0 1
1 2
2 3
4 5
5 6
6 7
8 9


Related Topics



Leave a reply



Submit