Evaluating both column name and the target value within `j` expression within `data.table`
Here is one possible alternative.
target <- "vs"
value <- 1
dt <- as.data.table(head(mtcars))
In terms of code it's not necessarily simpler, but we can set up an unevaluated call cl
defined outside the scope of dt
which is to be evaluated inside the data table's environment.
cl <- substitute(
x == y,
list(x = as.name(target), y = value)
)
substitute()
might be necessary for longer expressions. But in this case, call()
would shorten the code and create the same cl
result. And so cl
could also be
cl <- call("==", as.name(target), value)
Now we can evaluate cl
inside dt
. On your example this seems to work fine.
dt[, NEWCOL := sum(eval(cl)), by = am][]
# mpg cyl disp hp drat wt qsec vs am gear carb NEWCOL
# 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
# 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1
# 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
# 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2
# 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2
# 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2
After thinking about this for a minute, I'm not sure value
needed to be substituted, and hence the following also works. But as David notes, the first approach is more time efficient.
dt[, eval(as.name(target)) == value]
# [1] FALSE FALSE TRUE TRUE FALSE TRUE
data.table subsetting by variable containing a column name
We can use get
dt[get(last(titles))!=0]
Use of ..col in a j expression in data.table
We can specify in the .SDcols
and then do the elementwise multiplication using Reduce
A[, Reduce(`*`, .SD), .SDcols = c(valA, valB)]
#[1] 258 276 49 178 10
Or instead of having two objects, create a single object
v1 <- c(valA, valB)
A[, ..v1][, do.call(`*`, .SD)]
#[1] 258 276 49 178 10
Or we can evaluate it separately and multiply
(A[, ..valA] * A[, ..valB])[[1]]
#[1] 258 276 49 178 10
Is it possible to pass column name within data.table on argument in the form of a variable in R
You can use strings in the on
argument. This allows us to construct the join using paste
:
A[B, on = paste(col_var, c(">= x", "< y")), r := r + z]
data.table grouped operations with variable names of columns without slow DT[, mean(get(colName)), by = grp]
It would be better to pass the dataset name d
to the FOO
function instead of passing the character string "d"
. Also, you can use lapply
combined with .SD
so that you can benefit from internal optimization instead of using mean(get(colName))
.
FOO2 = function(dataName=d, colName = "x") { # d instead of "d" passed to the first argument!
dataName[, lapply(.SD, mean), by=grp, .SDcols=colName]
}
Benchmark: FOO
vs FOO2
set.seed(147852)
n <- 1e7
d <- data.table(x = 1:n, grp = sample(1:1e5, n, replace = T))
microbenchmark::microbenchmark(
FOO(),
FOO2(),
times=5L
)
Unit: milliseconds
expr min lq mean median uq max neval
FOO() 4632.4014 4672.7781 4787.4958 4707.9023 4846.7081 5077.6893 5
FOO2() 255.0828 267.1322 297.0389 275.4467 281.9873 405.5456 5
Dynamically build call for lookup multiple columns
In recent development version it has been made much easier
ID[JN, (select) := .list_of_fields,
env=list(.list_of_fields=as.list(paste0('i.', select)))]
OLD solution before 1.14.1
Instead of mget
or eval-parse
there is still possibility to build the lookup call. While the mget
is the most user friendly, this one is both flexible and actually corresponds to building the j
expression.
Solution wrapped into batch.lookup
helper function taking character vector of column names to lookup.
library(data.table)
set.seed(1)
ID <- data.table(id = 1:3, meta = rep(1,3), key = "id")
JN <- data.table(idd = sample(ID$id, 3, FALSE), value = sample(letters, 3, FALSE), meta = rep(1,3), key = "idd")
select <- c("value","meta") # my fields to lookup
batch.lookup = function(x) {
as.call(list(
as.name(":="),
x,
as.call(c(
list(as.name("list")),
sapply(x, function(x) as.name(paste0("i.",x)), simplify=FALSE)
))
))
}
batch.lookup(select)
#`:=`(c("value", "meta"), list(value = i.value, meta = i.meta))
ID[JN, eval(batch.lookup(select))][]
# id meta value
#1: 1 1 x
#2: 2 1 v
#3: 3 1 f
To be fair this answer actually address call construction issue described by me as OP.
Selecting rows in data.table by expression on a character column name
If are passing an expression, use parse
DT[eval(parse(text = 'a==1'))]
If the column name is the only one specified as a string
col1 <- "a"
DT[eval(as.symbol(col1)) == 1]
Or to a less extent get
(not recommended though as it can lead to envir issues if not properly used)
DT[get(col1) == 1]
# a
#1: 1
#2: 1
Another option is to make use of .SDcols
DT[DT[, .SD[[1]] == 1,.SDcols = col1]]
# a
#1: 1
#2: 1
How to select a column of a data.table via a variable
As indicated by David,
dt_txt_con <- textConnection(as.character(DT[, .SD[[var]]]))
is working like a charm.
Related Topics
How to Use R with Google Colaboratory
Remove All Punctuation Except Apostrophes in R
How to Use a String Variable to Select a Data Frame Column Using $ Notation
Function to Calculate R2 (R-Squared) in R
Collapsing Data Frame by Selecting One Row Per Group
Pasting Elements of Two Vectors Alphabetically
Is There a Better Alternative Than String Manipulation to Programmatically Build Formulas
How to Loop/Repeat a Linear Regression in R
Find the Intersection of Overlapping Ranges in Two Tables Using Data.Table Function Foverlaps
Determine the Data Types of a Data Frame's Columns
Data.Table with Two String Columns of Set Elements, Extract Unique Rows with Each Row Unsorted
How to Read Only Lines That Fulfil a Condition from a CSV into R
Dplyr - Using Column Names as Function Arguments
Remove Ids That Occur X Times R