The Rules of Subsetting

The rules of subsetting

In addition to your nice solution using merge (thanks for that, I always forget merge), this can be achieved in base using ?interaction as follows. There may be other variations of this, but this is the one I am familiar with:

> df1[interaction(df1) %in% interaction(df2), ]

Now to answer your question: First, I think there's a typo (corrected) in:

df1[ df1$z %in% df2$c | df2$b == 9,] # second part should be df2$b == 9

You would get an error, because the first part evaluates to

[1] TRUE TRUE TRUE TRUE TRUE

and the second evaluates to:

[1] FALSE FALSE FALSE FALSE

You do a | operation on unequal lengths getting the error:

longer object length is not a multiple of shorter object length

Edit: If you have multiple columns then you can choose the interaction as such. For example, if you want to get from df1 the rows where the first two columns match with that of df2, then you could simply do:

> df1[interaction(df1[, 1:2]) %in% interaction(df2[, 1:2]), ]

subset a-rules in R by length of lhs

length gives you the number of rules. You need to use size instead.

subset(rules,subset = size(lhs) == 5)

Dynamically subset a data.frame by a list of rules

It's not very generalized--I mean each element will be ands and each of those elements in each element will be ors, but that's what your question asks.

df <- data.frame(col1 = c('a','s','x'),
                 col2 = c('a','z','s'),
                 col3 = c('a','c','b'),
                 stringsAsFactors = FALSE)

df[with(df, col1 == 's' 
         & col2 == 'z' 
         & (col3 == 'a' | col3 == 'b' | col3 == 'c')), ]

#   col1 col2 col3
# 2    s    z    c

rules <- list(col1 = c('s'), col2 = c('z'), col3 = c('a', 'b', 'c'))

df[Reduce(`&`, Map(`%in%`, df, rules)), ]

#   col1 col2 col3
# 2    s    z    c

magic

magic <- function(data, rules) {
  data[Reduce(`&`, Map(`%in%`, data, rules)), ]
}

magic(df, rules)
#   col1 col2 col3
# 2    s    z    c

Edit -- version 2

This one should work for 1) columns without rules and/or 2) rules not in the exact order of columns

magic <- function(data, rules) {
  rules <- rules[names(data)]
  idx <- Map(`%in%`, data, rules)
  idx[is.na(names(rules))] <- list(rep(TRUE, nrow(data)))
  data[Reduce(`&`, idx), ]
}

df <- data.frame(col1 = c('a','s','x'),
                 col2 = c('a','z','s'),
                 colx = rnorm(3),
                 col3 = c('a','c','b'),
                 stringsAsFactors = FALSE)

rules <- list(col2 = c('z'), col1 = c('s'), col3 = c('a', 'b', 'c'))
magic(df, rules)
#   col1 col2      colx col3
# 2    s    z -1.374339    c

more tests

magic(mtcars, list(gear = 4, carb = 1:2))

#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
# Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
# Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
# Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
# Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
# Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

R arules - subset of transactions that match a rule

Actually the subset syntax in the context of arules is very similar to any other context: you may want to try the following:

subset(transactions, items %in% lhs(r) & !items %in% rhs(r) )

I hope this helps!

The Rules of Subsetting