using %in% to subset a data.table
The expression
DT[x==a | x==b]
returns all rows in DT
where the values in x
and a
are equal or x
and b
are equal. This is the desired result.
On the other hand
DT[x%in%c(a,b)]
returns all rows where x
matches any value in c(a, b)
, not just the corresponding value. Thus your second row appears because x == 3
and 3
appears (somewhere) in a
.
Subsetting rows inside function in data.table
You could also use get
to make your function work:
dtSubset <- function(df, col, str) {
df[get(col) == str]
}
Now dtSubset(dt, "variable", "data")
will get you the intended result:
id variable
1: 2 data
2: 3 data
Subset with condition in data table
You can do :
library(data.table)
tmp[, .SD[!(id1 == max(id1) & time > 2)], user_id]
# user_id id1 time
#1: 1 1 1
#2: 1 1 2
#3: 1 1 3
#4: 1 1 4
#5: 1 3 1
#6: 1 3 2
#7: 2 2 1
#8: 2 2 2
subset rows in data table with many columns
I could not get your example to load in my console session but this is a much more "minima;" example that demonstrates a method. Not sure if it has the usual data.table efficiency though:
DT <- setDT( data.frame(x=1:2, y=0,z=0))
DT[, apply(.SD, 1, function(x){any(x>=2)}) ] gets you a logical vector for each row
# [1] FALSE TRUE
DT[ DT[, apply(.SD, 1, function(x){any(x>=2)}) ]] # uses that vector to select rows
x y z
1: 2 0 0
This should succeed as well:
DT[ as.logical(rowSums(DT >= 2))]
x y z
1: 2 0 0
For the second part consider this:
cols <- sapply(DT, function(x){ any(x>0)})
DT2[ ,.SD, .SDcols=names(cols[cols])]
subset data.table based on key being NOT an element of a list
What about :
library(data.table)
DT <- data.table(ID = c(1, 2, 4, 5, 10), A = c(13, 1, 13, 11, 12))
test <- data.table(ID = c(1, 5, 9, 10, 11, 12))
setkey(test,ID)
DT[!test, on="ID"]
Subset groups in a data.table using conditions on two columns
With data.table
:
DT[,.SD[any(x == 3 & y == 6)], by=group]
group id y x
<char> <int> <num> <num>
1: B 5 6 3
2: B 6 8 3
3: B 7 14 3
4: B 8 19 2
5: C 9 10 2
6: C 10 9 3
7: C 11 6 3
8: C 12 8 3
Another possibly faster option:
DT[, if (any(x == 3 & y == 6)) .SD, by=group]
How to filter/subset a data table based on condition on other columns in R?
Just group by Date and ID, count observations and filter when there are greater than one:
Data[, n:=.N, by = .(Date, ID)][n>1]
# Date ID Value n
# 1: 2020-01-04 1 189 3
# 2: 2020-01-04 1 654 3
# 3: 2020-01-04 1 333 3
R data.table struggling with conditional subsetting when column name is predefined elsewhere
I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:
DT[get(column_name) == 1,]
x y
1: 1 0
2: 1 1
If you need to use column_name
in J
, you can use get(..column_name)
:
DT[,get(..column_name)]
[1] 1 1 0 0
The ..
instructs evaluation to occur in the parent environment.
Another approach for using a string in either I
or J
is with eval(as.name(column_name))
:
DT[eval(as.name(column_name)) == 1]
x y
1: 1 0
2: 1 1
DT[,eval(as.name(column_name))]
[1] 1 1 0 0
Related Topics
How to Convert Entire Dataframe to Numeric While Preserving Decimals
How to Specify "Does Not Contain" in Dplyr Filter
Change the Color and Font of Text in Shiny App
What Is the Correct Way to Ask for User Input in an R Program
How to Give Color to Each Class in Scatter Plot in R
Using Parlapply and Clusterexport Inside a Function
Annotating Facet Title as Strip Over Facet
How to Change a Single Value in a Data.Frame
Replace Na with 0 in a Data Frame Column
How to Adjust Facet Size Manually
Fast Replacing Values in Dataframe in R
Anti-Aliasing in R Graphics Under Windows (As Per MAC)
How to Avoid Using Round() in Every \Sexpr{}
Optimal/Efficient Plotting of Survival/Regression Analysis Results
How to Group by All But One Columns
R Shiny: Download Existing File
Converting a Factor to Numeric Without Losing Information R (As.Numeric() Doesn't Seem to Work)