Subsetting in Data.Table

using %in% to subset a data.table

The expression

 DT[x==a | x==b]

returns all rows in DT where the values in x and a are equal or x and b are equal. This is the desired result.

On the other hand

 DT[x%in%c(a,b)]

returns all rows where x matches any value in c(a, b), not just the corresponding value. Thus your second row appears because x == 3 and 3 appears (somewhere) in a.

Subsetting rows inside function in data.table

You could also use get to make your function work:

dtSubset <- function(df, col, str) {
df[get(col) == str]
}

Now dtSubset(dt, "variable", "data") will get you the intended result:

   id variable
1: 2 data
2: 3 data

Subset with condition in data table

You can do :

library(data.table)
tmp[, .SD[!(id1 == max(id1) & time > 2)], user_id]

# user_id id1 time
#1: 1 1 1
#2: 1 1 2
#3: 1 1 3
#4: 1 1 4
#5: 1 3 1
#6: 1 3 2
#7: 2 2 1
#8: 2 2 2

subset rows in data table with many columns

I could not get your example to load in my console session but this is a much more "minima;" example that demonstrates a method. Not sure if it has the usual data.table efficiency though:

DT <- setDT( data.frame(x=1:2, y=0,z=0))
DT[, apply(.SD, 1, function(x){any(x>=2)}) ] gets you a logical vector for each row
# [1] FALSE TRUE
DT[ DT[, apply(.SD, 1, function(x){any(x>=2)}) ]] # uses that vector to select rows
x y z
1: 2 0 0

This should succeed as well:

DT[ as.logical(rowSums(DT >= 2))]
x y z
1: 2 0 0

For the second part consider this:

cols <- sapply(DT, function(x){ any(x>0)})
DT2[ ,.SD, .SDcols=names(cols[cols])]

subset data.table based on key being NOT an element of a list

What about :

library(data.table)
DT <- data.table(ID = c(1, 2, 4, 5, 10), A = c(13, 1, 13, 11, 12))
test <- data.table(ID = c(1, 5, 9, 10, 11, 12))
setkey(test,ID)
DT[!test, on="ID"]

Subset groups in a data.table using conditions on two columns

With data.table:

DT[,.SD[any(x == 3 & y == 6)], by=group]

group id y x
<char> <int> <num> <num>
1: B 5 6 3
2: B 6 8 3
3: B 7 14 3
4: B 8 19 2
5: C 9 10 2
6: C 10 9 3
7: C 11 6 3
8: C 12 8 3

Another possibly faster option:

DT[, if (any(x == 3 & y == 6)) .SD, by=group]

How to filter/subset a data table based on condition on other columns in R?

Just group by Date and ID, count observations and filter when there are greater than one:

Data[, n:=.N, by = .(Date, ID)][n>1]
# Date ID Value n
# 1: 2020-01-04 1 189 3
# 2: 2020-01-04 1 654 3
# 3: 2020-01-04 1 333 3

R data.table struggling with conditional subsetting when column name is predefined elsewhere

I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:

DT[get(column_name) == 1,]
x y
1: 1 0
2: 1 1

If you need to use column_name in J, you can use get(..column_name):

DT[,get(..column_name)]
[1] 1 1 0 0

The .. instructs evaluation to occur in the parent environment.

Another approach for using a string in either I or J is with eval(as.name(column_name)):

DT[eval(as.name(column_name)) == 1]
x y
1: 1 0
2: 1 1

DT[,eval(as.name(column_name))]
[1] 1 1 0 0


Related Topics



Leave a reply



Submit