How to Select Rows from Data.Frame with 2 Conditions

Selecting with complex criteria from pandas.DataFrame

Sure! Setup:

>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
'B': [randint(1, 9)*10 for x in range(10)],
'C': [randint(1, 9)*100 for x in range(10)]})
>>> df
A B C
0 9 40 300
1 9 70 700
2 5 70 900
3 8 80 900
4 7 50 200
5 9 30 900
6 2 80 700
7 2 80 400
8 5 80 300
9 7 70 800

We can apply column operations and get boolean Series objects:

>>> df["B"] > 50
0 False
1 True
2 True
3 True
4 False
5 False
6 True
7 True
8 True
9 True
Name: B
>>> (df["B"] > 50) & (df["C"] == 900)
0 False
1 False
2 True
3 True
4 False
5 False
6 False
7 False
8 False
9 False

[Update, to switch to new-style .loc]:

And then we can use these to index into the object. For read access, you can chain indices:

>>> df["A"][(df["B"] > 50) & (df["C"] == 900)]
2 5
3 8
Name: A, dtype: int64

but you can get yourself into trouble because of the difference between a view and a copy doing this for write access. You can use .loc instead:

>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"]
2 5
3 8
Name: A, dtype: int64
>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"].values
array([5, 8], dtype=int64)
>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"] *= 1000
>>> df
A B C
0 9 40 300
1 9 70 700
2 5000 70 900
3 8000 80 900
4 7 50 200
5 9 30 900
6 2 80 700
7 2 80 400
8 5 80 300
9 7 70 800

Note that I accidentally typed == 900 and not != 900, or ~(df["C"] == 900), but I'm too lazy to fix it. Exercise for the reader. :^)

How to select rows from data.frame with 2 conditions

Use & not &&. The latter only evaluates the first element of each vector.

Update: to answer the second part, use the reshape package. Something like this will do it:

tablex <- recast(aggdata, Group.1 ~ variable * Group.2, id.var=1:2)
# Now add useful column and row names
colnames(tablex) <- gsub("x_","",colnames(tablex))
rownames(tablex) <- tablex[,1]
# Finally remove the redundant first column
tablex <- tablex[,-1]

Someone with more experience using reshape may have a simpler solution.

Note: Don't use table as a variable name as it conflicts with the table() function.

how to select rows based on two condition pandas

You can use:

df = df_1.merge(df_2, how='left', on='a')
print(df[df.b.isin(['Yes', np.nan])][['a']])

OUTPUT

   a
0 1
1 2
2 3
4 5
5 6
6 7
8 9

Selecting Rows of Data Based on Multiple Conditions

The problem is, that you have lists in your conditions.

class(conditions[1,7])
# [1] "list"

Solution: unlist.

my_data[my_data$var_1 %in% unlist(conditions[1,7]) &
my_data$var_2 %in% unlist(conditions[1,8]) &
my_data$var_3 %in% unlist(conditions[1,9]) &
my_data$var_4 %in% unlist(conditions[1,10]) &
my_data$var_5 %in% unlist(conditions[1,11]), ]
# var_1 var_2 var_3 var_4 var_5
# 3 9 6 5 3 9
# 6 9 2 6 7 9
# 21 2 10 5 9 5

Consider this small example:

1:5 
# [1] 1 2 3 4 5
list(2:4)
# [[1]]
# [1] 2 3 4
unlist(list(2:4))
# [1] 2 3 4

The conditions are one level deeper:

1:5 %in% list(2:4)
# [1] FALSE FALSE FALSE FALSE FALSE
1:5 %in% unlist(list(2:4))
# [1] FALSE TRUE TRUE TRUE FALSE

Pandas DataFrame : How to select rows on multiple conditions?

we can use DataFrame.query() method like this:

In [109]: dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}

In [110]: qry = ' and '.join(['{} <= {}'.format(k,v) for k,v in dct.items()])

In [111]: qry
Out[111]: 'name <= 4.0 and sex <= 0.0 and city <= 2 and age <= 3.0'

In [112]: df.query(qry)
...

Pandas slicing/selecting with multiple conditions with or statement

The important thing to note is that & is not identical to and; they are different things so the "or" equivalent to & is |

Normally both & and | are bitwise logical operators rather than the python "logical" operators.

In pandas these operators are overloaded for Series operation.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
...: 'c'])

In [4]: df
Out[4]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5

In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
a b c
1 2 3 5
3 3 2 5

In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5

How to select rows in a DataFrame between two values, in Python Pandas?

You should use () to group your boolean vector to remove ambiguity.

df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)]

Selecting rows based on multiple conditions using OR instead of AND in R

subset is designed exactly for this:

subset(test, tot.type >= 2 | tot >= 2)


Related Topics



Leave a reply



Submit