How to Select Rows According to Column Value Conditions

How do I select rows from a DataFrame based on column values?

To select rows whose column value equals a scalar, some_value, use ==:

df.loc[df['column_name'] == some_value]

To select rows whose column value is in an iterable, some_values, use isin:

df.loc[df['column_name'].isin(some_values)]

Combine multiple conditions with &:

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

df['column_name'] >= A & df['column_name'] <= B

is parsed as

df['column_name'] >= (A & df['column_name']) <= B

which results in a Truth value of a Series is ambiguous error.

To select rows whose column value does not equal some_value, use !=:

df.loc[df['column_name'] != some_value]

isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:

df.loc[~df['column_name'].isin(some_values)]

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use isin:

print(df.loc[df['B'].isin(['one','three'])])

yields

     A      B  C   D
0  foo    one  0   0
1  bar    one  1   2
3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14

Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use df.loc:

df = df.set_index(['B'])
print(df.loc['one'])

yields

       A  C   D
B              
one  foo  0   0
one  bar  1   2
one  foo  6  12

or, to include multiple values from the index use df.index.isin:

df.loc[df.index.isin(['one','two'])]

yields

       A  C   D
B              
one  foo  0   0
one  bar  1   2
two  foo  2   4
two  foo  4   8
two  bar  5  10
one  foo  6  12

Select rows from a dataframe based on a condition and then assign a priority number to them in a new column

Is this what you are looking for?

df = data.frame("no_of_cases" = c(12,22,34), "grid_number" = c(454,345,67))

df %>% arrange(desc(no_of_cases)) %>% mutate("priority" = rank(-no_of_cases))

Pandas: how to select rows based on two conditions in the same column

What it sounds like you may be trying to do is see if ALL values in a group are only 'B' or 'E'. At the same time, your expected output has also excluded rows that meet that condition but only have one member of the group. You can groupby the "pair" columns you have mentioned and use list comprehension to check if all values are either D or E with all([True... ). I have also included an additional piece of logic and len(x) > 1, since your output excludes groups with only one row. This creates a boolean series s of True or False if the condition is met, which you can use to filter directly on the dataframe, and get the "expected output".

s = df.merge(df.groupby(['col1', 'col2'])['C'].apply(lambda x: all([True if y in ['D', 'E'] 
                                                                    and len(x) > 1 
                                                                    else False for y in x ]))
             .reset_index(),
             how='left', on=['col1', 'col2']).iloc[:,-1]
df[s]
Out[1]: 
   col1   col2  C  val
0   aaa  rte_1  D   58
4   aaa  rte_5  E   95
6   aaa  rte_1  D   57
10  aaa  rte_5  E    3

Pandas- Select rows from DataFrame based on condition

I think you need boolean indexing:

df1 = df[(df['category'] == 'A') & (df['value'].between(10,20))]
print (df1)
  category  value
2        A     15
4        A     18

And then:

df2 = df[(df['category'] != 'A') & (df['value'].between(10,20))]
print (df2)
  category  value
1        B     10

Or:

df3 = df[df['category'] != 'A']
print (df3)
  category  value
1        B     10
3        B     28

EDIT: Join both conditions with | for or, dont forget add () to first conditions.

df1 = df[((df['category'] == 'A') & (df['value'].between(10,20))) | 
         (df['category'] != 'A')]
print (df1)
  category  value
1        B     10
2        A     15
3        B     28
4        A     18

How to select dataframe rows according to multi-(other column)-condition on columnar groups?

Here's one solution - groupby textId, then keep only those groups where the unique values of score is a superset (>=) of [1.0, 2.0, 3.0].

In [58]: df.groupby('textId').filter(lambda x: set(x['score']) >= set([1.,2.,3.]))
Out[58]: 
  textId  score              textInfo
3  name2    1.0  different_text_stuff
4  name2    1.3  different_text_stuff
5  name2    2.0  still_different_text
6  name2    1.0              yoko ono
7  name2    3.0     I lika da Gweneth

Pandas selecting rows with multiple conditions

You can use between. By default, it's both sides inclusive.

out = df[df['C'].between(0,1)]

If you want only one side inclusive, you can select that as well. For example, the following is only right-side inclusive:

out = df[df['C'].between(0,1, inclusive='right')]

Output:

          A         B         C
0  1.764052  0.400157  0.978738

how to select rows from a data frame using a condition from two columns in R

You can use

df[df$padj.co < 0.05 & df$padj.c2 >= 0.05, ]

I don't think I understand your second question without more background.

how to select rows based on two condition pandas

You can use:

df = df_1.merge(df_2, how='left', on='a')
print(df[df.b.isin(['Yes', np.nan])][['a']])

OUTPUT

How to Select Rows According to Column Value Conditions

How do I select rows from a DataFrame based on column values?

Select rows from a dataframe based on a condition and then assign a priority number to them in a new column

Pandas: how to select rows based on two conditions in the same column

Pandas- Select rows from DataFrame based on condition

How to select dataframe rows according to multi-(other column)-condition on columnar groups?

Pandas selecting rows with multiple conditions

how to select rows from a data frame using a condition from two columns in R

how to select rows based on two condition pandas

Related Topics

Leave a reply