Selecting with complex criteria from pandas.DataFrame
Sure! Setup:
>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
'B': [randint(1, 9)*10 for x in range(10)],
'C': [randint(1, 9)*100 for x in range(10)]})
>>> df
A B C
0 9 40 300
1 9 70 700
2 5 70 900
3 8 80 900
4 7 50 200
5 9 30 900
6 2 80 700
7 2 80 400
8 5 80 300
9 7 70 800
We can apply column operations and get boolean Series objects:
>>> df["B"] > 50
0 False
1 True
2 True
3 True
4 False
5 False
6 True
7 True
8 True
9 True
Name: B
>>> (df["B"] > 50) & (df["C"] == 900)
0 False
1 False
2 True
3 True
4 False
5 False
6 False
7 False
8 False
9 False
[Update, to switch to new-style .loc
]:
And then we can use these to index into the object. For read access, you can chain indices:
>>> df["A"][(df["B"] > 50) & (df["C"] == 900)]
2 5
3 8
Name: A, dtype: int64
but you can get yourself into trouble because of the difference between a view and a copy doing this for write access. You can use .loc
instead:
>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"]
2 5
3 8
Name: A, dtype: int64
>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"].values
array([5, 8], dtype=int64)
>>> df.loc[(df["B"] > 50) & (df["C"] == 900), "A"] *= 1000
>>> df
A B C
0 9 40 300
1 9 70 700
2 5000 70 900
3 8000 80 900
4 7 50 200
5 9 30 900
6 2 80 700
7 2 80 400
8 5 80 300
9 7 70 800
Note that I accidentally typed == 900
and not != 900
, or ~(df["C"] == 900)
, but I'm too lazy to fix it. Exercise for the reader. :^)
How to select rows from data.frame with 2 conditions
Use & not &&. The latter only evaluates the first element of each vector.
Update: to answer the second part, use the reshape package. Something like this will do it:
tablex <- recast(aggdata, Group.1 ~ variable * Group.2, id.var=1:2)
# Now add useful column and row names
colnames(tablex) <- gsub("x_","",colnames(tablex))
rownames(tablex) <- tablex[,1]
# Finally remove the redundant first column
tablex <- tablex[,-1]
Someone with more experience using reshape may have a simpler solution.
Note: Don't use table as a variable name as it conflicts with the table() function.
how to select rows based on two condition pandas
You can use:
df = df_1.merge(df_2, how='left', on='a')
print(df[df.b.isin(['Yes', np.nan])][['a']])
OUTPUT
a
0 1
1 2
2 3
4 5
5 6
6 7
8 9
Selecting Rows of Data Based on Multiple Conditions
The problem is, that you have lists in your conditions.
class(conditions[1,7])
# [1] "list"
Solution: unlist
.
my_data[my_data$var_1 %in% unlist(conditions[1,7]) &
my_data$var_2 %in% unlist(conditions[1,8]) &
my_data$var_3 %in% unlist(conditions[1,9]) &
my_data$var_4 %in% unlist(conditions[1,10]) &
my_data$var_5 %in% unlist(conditions[1,11]), ]
# var_1 var_2 var_3 var_4 var_5
# 3 9 6 5 3 9
# 6 9 2 6 7 9
# 21 2 10 5 9 5
Consider this small example:
1:5
# [1] 1 2 3 4 5
list(2:4)
# [[1]]
# [1] 2 3 4
unlist(list(2:4))
# [1] 2 3 4
The conditions are one level deeper:
1:5 %in% list(2:4)
# [1] FALSE FALSE FALSE FALSE FALSE
1:5 %in% unlist(list(2:4))
# [1] FALSE TRUE TRUE TRUE FALSE
Pandas DataFrame : How to select rows on multiple conditions?
we can use DataFrame.query() method like this:
In [109]: dct = {'name': 4.0, 'sex': 0.0, 'city': 2, 'age': 3.0}
In [110]: qry = ' and '.join(['{} <= {}'.format(k,v) for k,v in dct.items()])
In [111]: qry
Out[111]: 'name <= 4.0 and sex <= 0.0 and city <= 2 and age <= 3.0'
In [112]: df.query(qry)
...
Pandas slicing/selecting with multiple conditions with or statement
The important thing to note is that &
is not identical to and
; they are different things so the "or" equivalent to &
is |
Normally both &
and |
are bitwise logical operators rather than the python "logical" operators.
In pandas these operators are overloaded for Series
operation.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame([[1,4,3],[2,3,5],[4,5,6],[3,2,5]], columns=['a', 'b',
...: 'c'])
In [4]: df
Out[4]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
In [5]: df.loc[(df.a != 1) & (df.b < 5)]
Out[5]:
a b c
1 2 3 5
3 3 2 5
In [6]: df.loc[(df.a != 1) | (df.b < 5)]
Out[6]:
a b c
0 1 4 3
1 2 3 5
2 4 5 6
3 3 2 5
How to select rows in a DataFrame between two values, in Python Pandas?
You should use ()
to group your boolean vector to remove ambiguity.
df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)]
Selecting rows based on multiple conditions using OR instead of AND in R
subset
is designed exactly for this:
subset(test, tot.type >= 2 | tot >= 2)
Related Topics
Add New Variable to List of Data Frames with Purrr and Mutate() from Dplyr
Roll Your Own Linked List/Tree in R
How to Calculate Wind Direction from U and V Wind Components in R
How Does Gganimate Order an Ordered Bar Time-Series
Ordering Permutation in Rcpp I.E. Base::Order()
R - How to Test for Character(0) in If Statement
Adding Simple Legend to Plot in R
Display an Axis Value in Millions in Ggplot
How to Extract Everything Until First Occurrence of Pattern
How to Get My Blogdown Blog on R-Bloggers
Dplyr Join Warning: Joining Factors with Different Levels
Why Is Seq(X) So Much Slower Than 1:Length(X)
Dplyr::First() to Choose First Non Na Value
Installing a Package Offline from Github
How to Combine Aes() and Aes_String() Options
Randomly Sample a Percentage of Rows Within a Data Frame
How to Specify (Non-R) Library Path for Dynamic Library Loading in R