Using Multiple Criteria in Subset Function and Logical Operators

Using multiple criteria in subset function and logical operators

The correct operator is %in% here. Here is an example with dummy data:

set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                  foo = runif(10))

giving:

> head(dat)
  bf11       foo
1    2 0.2059746
2    2 0.1765568
3    3 0.6870228
4    4 0.3841037
5    1 0.7698414
6    4 0.4976992

The subset of dat where bf11 equals any of the set 1,2,3 is taken as follows using %in%:

> subset(dat, subset = bf11 %in% c(1,2,3))
   bf11       foo
1     2 0.2059746
2     2 0.1765568
3     3 0.6870228
5     1 0.7698414
8     3 0.9919061
9     3 0.3800352
10    1 0.7774452

As to why your original didn't work, break it down to see the problem. Look at what 1||2||3 evaluates to:

> 1 || 2 || 3
[1] TRUE

and you'd get the same using | instead. As a result, the subset() call would only return rows where bf11 was TRUE (or something that evaluated to TRUE).

What you could have written would have been something like:

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

Which gives the same result as my earlier subset() call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in% is far more useful and less verbose in such circumstances. Notice also that I have to use | as I want to compare each element of bf11 against 1, 2, and 3, in turn. Compare:

> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
 [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

How to combine multiple conditions to subset a data-frame using OR ?

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4)  , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1
[1] NA
> 0 & NA
[1] FALSE

Order of arguments may matter when using '&".

Subset dataframe by multiple logical conditions of rows to remove

The ! should be around the outside of the statement:

data[!(data$v1 %in% c("b", "d", "e")), ]

  v1 v2 v3 v4
1  a  v  d  c
2  a  v  d  d
5  c  k  d  c
6  c  r  p  g

Convert character argument with multiple conditional statements into logical to subset in a function independent of dataframe name

> subset(df, sex ==1 & race4 == 2)
   sex race4
5    1     2
8    1     2
13   1     2

Approach I using logical object:

# Create a logical object
> l<-df$sex ==1 & df$race4 == 2

> str(l) # check structure of 'l'
 logi [1:20] FALSE FALSE FALSE FALSE TRUE FALSE ...

> subset(df,l)
   sex race4
5    1     2
8    1     2
13   1     2

Approach II using character object:

> a<-"sex ==1 & race4 == 2"
> subset(df, eval(parse(text=a)))
> sex race4
  5    1     2
  8    1     2
  13   1     2

Is it possible to combine parameters to a subset function that is generated programmatically in R?

You should probably avoid two things: using subset in non-interactive setting (see warning in the help pages) and eval(parse()). Here we go.

You can change the expression into a string and append it whatever you want. The trick is to convert the string back to expression. This is where the aforementioned parse comes in.

sub1 <- expression(Species == "setosa")

subset(iris, eval(sub1))

sub2 <- paste(sub1, '&', 'Petal.Width > 0.2')

subset(iris, eval(parse(text = sub2)))  # your case

> subset(iris, eval(parse(text = sub2)))
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
16          5.7         4.4          1.5         0.4  setosa
17          5.4         3.9          1.3         0.4  setosa
18          5.1         3.5          1.4         0.3  setosa
19          5.7         3.8          1.7         0.3  setosa
20          5.1         3.8          1.5         0.3  setosa
22          5.1         3.7          1.5         0.4  setosa
24          5.1         3.3          1.7         0.5  setosa
27          5.0         3.4          1.6         0.4  setosa
32          5.4         3.4          1.5         0.4  setosa
41          5.0         3.5          1.3         0.3  setosa
42          4.5         2.3          1.3         0.3  setosa
44          5.0         3.5          1.6         0.6  setosa
45          5.1         3.8          1.9         0.4  setosa
46          4.8         3.0          1.4         0.3  setosa

Using Multiple Criteria in Subset Function and Logical Operators