Using multiple criteria in subset function and logical operators
The correct operator is %in%
here. Here is an example with dummy data:
set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
foo = runif(10))
giving:
> head(dat)
bf11 foo
1 2 0.2059746
2 2 0.1765568
3 3 0.6870228
4 4 0.3841037
5 1 0.7698414
6 4 0.4976992
The subset of dat
where bf11
equals any of the set 1,2,3
is taken as follows using %in%
:
> subset(dat, subset = bf11 %in% c(1,2,3))
bf11 foo
1 2 0.2059746
2 2 0.1765568
3 3 0.6870228
5 1 0.7698414
8 3 0.9919061
9 3 0.3800352
10 1 0.7774452
As to why your original didn't work, break it down to see the problem. Look at what 1||2||3
evaluates to:
> 1 || 2 || 3
[1] TRUE
and you'd get the same using |
instead. As a result, the subset()
call would only return rows where bf11
was TRUE
(or something that evaluated to TRUE
).
What you could have written would have been something like:
subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)
Which gives the same result as my earlier subset()
call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in%
is far more useful and less verbose in such circumstances. Notice also that I have to use |
as I want to compare each element of bf11
against 1
, 2
, and 3
, in turn. Compare:
> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
[1] TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
How to combine multiple conditions to subset a data-frame using OR ?
my.data.frame <- subset(data , V1 > 2 | V2 < 4)
An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:
new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]
Some people criticize the use of which
as not needed, but it does prevent the NA
values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which
would be:
new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]
Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...
> NA & 1
[1] NA
> 0 & NA
[1] FALSE
Order of arguments may matter when using '&".
Subset dataframe by multiple logical conditions of rows to remove
The !
should be around the outside of the statement:
data[!(data$v1 %in% c("b", "d", "e")), ]
v1 v2 v3 v4
1 a v d c
2 a v d d
5 c k d c
6 c r p g
Convert character argument with multiple conditional statements into logical to subset in a function independent of dataframe name
> subset(df, sex ==1 & race4 == 2)
sex race4
5 1 2
8 1 2
13 1 2
Approach I using logical object:
# Create a logical object
> l<-df$sex ==1 & df$race4 == 2
> str(l) # check structure of 'l'
logi [1:20] FALSE FALSE FALSE FALSE TRUE FALSE ...
> subset(df,l)
sex race4
5 1 2
8 1 2
13 1 2
Approach II using character object:
> a<-"sex ==1 & race4 == 2"
> subset(df, eval(parse(text=a)))
> sex race4
5 1 2
8 1 2
13 1 2
Is it possible to combine parameters to a subset function that is generated programmatically in R?
You should probably avoid two things: using subset
in non-interactive setting (see warning in the help pages) and eval(parse())
. Here we go.
You can change the expression into a string and append it whatever you want. The trick is to convert the string back to expression. This is where the aforementioned parse
comes in.
sub1 <- expression(Species == "setosa")
subset(iris, eval(sub1))
sub2 <- paste(sub1, '&', 'Petal.Width > 0.2')
subset(iris, eval(parse(text = sub2))) # your case
> subset(iris, eval(parse(text = sub2)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
22 5.1 3.7 1.5 0.4 setosa
24 5.1 3.3 1.7 0.5 setosa
27 5.0 3.4 1.6 0.4 setosa
32 5.4 3.4 1.5 0.4 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa
Related Topics
Building R Package and Error "Ld: Cannot Find -Lgfortran"
Data.Table with Two String Columns of Set Elements, Extract Unique Rows with Each Row Unsorted
Pass a Vector of Variable Names to Arrange() in Dplyr
Cumulative Sum That Resets When 0 Is Encountered
How to Count the Frequency of a String for Each Row in R
How to Calculate Combination and Permutation in R
How to Choose Variable to Display in Tooltip When Using Ggplotly
Remove All Punctuation Except Apostrophes in R
Using Rcpp Within Parallel Code via Snow to Make a Cluster
Proper Idiom for Adding Zero Count Rows in Tidyr/Dplyr
For Loop Over Dygraph Does Not Work in R
Replacing Numbers Within a Range with a Factor
Avoid Ggplot Sorting the X-Axis While Plotting Geom_Bar()
Why Is Allow.Cartesian Required at Times When When Joining Data.Tables with Duplicate Keys