How to Select Rows from Data.Frame with 2 Conditions

Selecting with complex criteria from pandas.DataFrame

Sure! Setup:

>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
                   'B': [randint(1, 9)*10 for x in range(10)],
                   'C': [randint(1, 9)*100 for x in range(10)]})
>>> df
   A   B    C
0  9  40  300
1  9  70  700
2  5  70  900
3  8  80  900
4  7  50  200
5  9  30  900
6  2  80  700
7  2  80  400
8  5  80  300
9  7  70  800

We can apply column operations and get boolean Series objects:

>>> df["B"] > 50
0    False
1     True
2     True
3     True
4    False
5    False
6     True
7     True
8     True
9     True
Name: B
>>> (df["B"] > 50) & (df["C"] == 900)
0    False
1    False
2     True
3     True
4    False
5    False
6    False
7    False
8    False
9    False

[Update, to switch to new-style .loc]:

And then we can use these to index into the object. For read access, you can chain indices:

>>> df["A"][(df["B"] > 50) & (df["C"] == 900)]
2    5
3    8
Name: A, dtype: int64

but you can get yourself into trouble because of the difference between a view and a copy doing this for write access. You can use .loc instead:

>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"]
2    5
3    8
Name: A, dtype: int64
>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"].values
array([5, 8], dtype=int64)
>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"] *= 1000
>>> df
      A   B    C
0     9  40  300
1     9  70  700
2  5000  70  900
3  8000  80  900
4     7  50  200
5     9  30  900
6     2  80  700
7     2  80  400
8     5  80  300
9     7  70  800

Note that I accidentally typed == 900 and not != 900, or ~(df["C"] == 900), but I'm too lazy to fix it. Exercise for the reader. :^)

How to select rows from data.frame with 2 conditions

Use & not &&. The latter only evaluates the first element of each vector.

Update: to answer the second part, use the reshape package. Something like this will do it:

tablex <- recast(aggdata, Group.1 ~ variable * Group.2, id.var=1:2)
# Now add useful column and row names
colnames(tablex) <- gsub("x_","",colnames(tablex))
rownames(tablex) <- tablex[,1]
# Finally remove the redundant first column
tablex <- tablex[,-1]

Someone with more experience using reshape may have a simpler solution.

Note: Don't use table as a variable name as it conflicts with the table() function.

how to select rows based on two condition pandas

You can use:

df = df_1.merge(df_2, how='left', on='a')
print(df[df.b.isin(['Yes', np.nan])][['a']])

OUTPUT

Selecting Rows of Data Based on Multiple Conditions

The problem is, that you have lists in your conditions.

class(conditions[1,7])
# [1] "list"

Solution: unlist.

my_data[my_data$var_1 %in% unlist(conditions[1,7]) &
          my_data$var_2 %in% unlist(conditions[1,8]) & 
          my_data$var_3 %in% unlist(conditions[1,9]) & 
          my_data$var_4 %in% unlist(conditions[1,10]) & 
          my_data$var_5 %in% unlist(conditions[1,11]), ]
#    var_1 var_2 var_3 var_4 var_5
# 3      9     6     5     3     9
# 6      9     2     6     7     9
# 21     2    10     5     9     5

Consider this small example:

1:5 
# [1] 1 2 3 4 5
list(2:4)
# [[1]]
# [1] 2 3 4
unlist(list(2:4))
# [1] 2 3 4

The conditions are one level deeper:

1:5 %in% list(2:4)
# [1] FALSE FALSE FALSE FALSE FALSE
1:5 %in% unlist(list(2:4))
# [1] FALSE  TRUE  TRUE  TRUE FALSE

Pandas DataFrame : How to select rows on multiple conditions?

we can use DataFrame.query() method like this:

In [109]: dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}

In [110]: qry = ' and '.join(['{} <= {}'.format(k,v) for k,v in dct.items()])

In [111]: qry
Out[111]: 'name <= 4.0 and sex <= 0.0 and city <= 2 and age <= 3.0'

In [112]: df.query(qry)
...

Pandas slicing/selecting with multiple conditions with or statement

The important thing to note is that & is not identical to and; they are different things so the "or" equivalent to & is |

Normally both & and | are bitwise logical operators rather than the python "logical" operators.

In pandas these operators are overloaded for Series operation.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
   ...:  'c'])

In [4]: df
Out[4]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5

In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
   a  b  c
1  2  3  5
3  3  2  5

In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
   a  b  c
0  1  4  3
1  2  3  5
2  4  5  6
3  3  2  5

How to select rows in a DataFrame between two values, in Python Pandas?

You should use () to group your boolean vector to remove ambiguity.

df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)]

Selecting rows based on multiple conditions using OR instead of AND in R

subset is designed exactly for this:

subset(test, tot.type >= 2 | tot >= 2)