Subsetting in Data.Table

using %in% to subset a data.table

The expression

 DT[x==a | x==b]

returns all rows in DT where the values in x and a are equal or x and b are equal. This is the desired result.

On the other hand

 DT[x%in%c(a,b)]

returns all rows where x matches any value in c(a, b), not just the corresponding value. Thus your second row appears because x == 3 and 3 appears (somewhere) in a.

Subsetting rows inside function in data.table

You could also use get to make your function work:

dtSubset <- function(df, col, str) {
  df[get(col) == str]
}

Now dtSubset(dt, "variable", "data") will get you the intended result:

   id variable
1:  2     data
2:  3     data

Subset with condition in data table

You can do :

library(data.table)
tmp[, .SD[!(id1 == max(id1) & time > 2)], user_id]

#   user_id id1 time
#1:       1   1    1
#2:       1   1    2
#3:       1   1    3
#4:       1   1    4
#5:       1   3    1
#6:       1   3    2
#7:       2   2    1
#8:       2   2    2

subset rows in data table with many columns

I could not get your example to load in my console session but this is a much more "minima;" example that demonstrates a method. Not sure if it has the usual data.table efficiency though:

DT <- setDT( data.frame(x=1:2, y=0,z=0))
DT[, apply(.SD, 1, function(x){any(x>=2)}) ] gets you a logical vector for each row
# [1] FALSE  TRUE
DT[ DT[, apply(.SD, 1, function(x){any(x>=2)}) ]] # uses that vector to select rows
   x y z
1: 2 0 0

This should succeed as well:

DT[ as.logical(rowSums(DT >= 2))]
   x y z
1: 2 0 0

For the second part consider this:

cols <- sapply(DT, function(x){ any(x>0)})
DT2[ ,.SD, .SDcols=names(cols[cols])]

subset data.table based on key being NOT an element of a list

What about :

library(data.table)
DT   <- data.table(ID = c(1, 2, 4, 5, 10), A = c(13, 1, 13, 11, 12))
test <- data.table(ID = c(1, 5, 9, 10, 11, 12))
setkey(test,ID)
DT[!test, on="ID"]

Subset groups in a data.table using conditions on two columns

With data.table:

DT[,.SD[any(x == 3 & y == 6)], by=group]

    group    id     y     x
   <char> <int> <num> <num>
1:      B     5     6     3
2:      B     6     8     3
3:      B     7    14     3
4:      B     8    19     2
5:      C     9    10     2
6:      C    10     9     3
7:      C    11     6     3
8:      C    12     8     3

Another possibly faster option:

DT[, if (any(x == 3 & y == 6)) .SD, by=group]

How to filter/subset a data table based on condition on other columns in R?

Just group by Date and ID, count observations and filter when there are greater than one:

Data[, n:=.N, by = .(Date, ID)][n>1]
#          Date ID Value n
# 1: 2020-01-04  1   189 3
# 2: 2020-01-04  1   654 3
# 3: 2020-01-04  1   333 3

R data.table struggling with conditional subsetting when column name is predefined elsewhere

I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:

DT[get(column_name) == 1,]
   x y
1: 1 0
2: 1 1

If you need to use column_name in J, you can use get(..column_name):

DT[,get(..column_name)]
[1] 1 1 0 0

The .. instructs evaluation to occur in the parent environment.

Another approach for using a string in either I or J is with eval(as.name(column_name)):

DT[eval(as.name(column_name)) == 1]
   x y
1: 1 0
2: 1 1

DT[,eval(as.name(column_name))]
[1] 1 1 0 0