Evaluating Both Column Name and the Target Value Within 'J' Expression Within 'Data.Table'

Evaluating both column name and the target value within `j` expression within `data.table`

Here is one possible alternative.

target <- "vs"
value <- 1
dt <- as.data.table(head(mtcars))

In terms of code it's not necessarily simpler, but we can set up an unevaluated call cl defined outside the scope of dt which is to be evaluated inside the data table's environment.

cl <- substitute(
x == y,
list(x = as.name(target), y = value)
)

substitute() might be necessary for longer expressions. But in this case, call() would shorten the code and create the same cl result. And so cl could also be

cl <- call("==", as.name(target), value)

Now we can evaluate cl inside dt. On your example this seems to work fine.

dt[, NEWCOL := sum(eval(cl)), by = am][]
# mpg cyl disp hp drat wt qsec vs am gear carb NEWCOL
# 1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
# 2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1
# 3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
# 4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 2
# 5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2
# 6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2

After thinking about this for a minute, I'm not sure value needed to be substituted, and hence the following also works. But as David notes, the first approach is more time efficient.

dt[, eval(as.name(target)) == value]
# [1] FALSE FALSE TRUE TRUE FALSE TRUE

data.table subsetting by variable containing a column name

We can use get

 dt[get(last(titles))!=0]

Use of ..col in a j expression in data.table

We can specify in the .SDcols and then do the elementwise multiplication using Reduce

A[, Reduce(`*`, .SD), .SDcols = c(valA, valB)]
#[1] 258 276 49 178 10

Or instead of having two objects, create a single object

v1 <- c(valA, valB)
A[, ..v1][, do.call(`*`, .SD)]
#[1] 258 276 49 178 10

Or we can evaluate it separately and multiply

(A[, ..valA] * A[, ..valB])[[1]]
#[1] 258 276 49 178 10

Is it possible to pass column name within data.table on argument in the form of a variable in R

You can use strings in the on argument. This allows us to construct the join using paste:

A[B, on = paste(col_var, c(">= x", "< y")), r := r + z]

data.table grouped operations with variable names of columns without slow DT[, mean(get(colName)), by = grp]

It would be better to pass the dataset name d to the FOO function instead of passing the character string "d". Also, you can use lapply combined with .SD so that you can benefit from internal optimization instead of using mean(get(colName)).

FOO2 = function(dataName=d, colName = "x") { # d instead of "d" passed to the first argument!
dataName[, lapply(.SD, mean), by=grp, .SDcols=colName]
}

Benchmark: FOO vs FOO2

set.seed(147852)
n <- 1e7
d <- data.table(x = 1:n, grp = sample(1:1e5, n, replace = T))

microbenchmark::microbenchmark(
FOO(),
FOO2(),
times=5L
)

Unit: milliseconds
expr min lq mean median uq max neval
FOO() 4632.4014 4672.7781 4787.4958 4707.9023 4846.7081 5077.6893 5
FOO2() 255.0828 267.1322 297.0389 275.4467 281.9873 405.5456 5

Dynamically build call for lookup multiple columns

In recent development version it has been made much easier

ID[JN, (select) := .list_of_fields,
env=list(.list_of_fields=as.list(paste0('i.', select)))]

OLD solution before 1.14.1

Instead of mget or eval-parse there is still possibility to build the lookup call. While the mget is the most user friendly, this one is both flexible and actually corresponds to building the j expression.

Solution wrapped into batch.lookup helper function taking character vector of column names to lookup.

    library(data.table)
set.seed(1)
ID <- data.table(id = 1:3, meta = rep(1,3), key = "id")
JN <- data.table(idd = sample(ID$id, 3, FALSE), value = sample(letters, 3, FALSE), meta = rep(1,3), key = "idd")
select <- c("value","meta") # my fields to lookup
batch.lookup = function(x) {
as.call(list(
as.name(":="),
x,
as.call(c(
list(as.name("list")),
sapply(x, function(x) as.name(paste0("i.",x)), simplify=FALSE)
))
))
}
batch.lookup(select)
#`:=`(c("value", "meta"), list(value = i.value, meta = i.meta))
ID[JN, eval(batch.lookup(select))][]
# id meta value
#1: 1 1 x
#2: 2 1 v
#3: 3 1 f

To be fair this answer actually address call construction issue described by me as OP.

Selecting rows in data.table by expression on a character column name

If are passing an expression, use parse

DT[eval(parse(text = 'a==1'))] 

If the column name is the only one specified as a string

col1 <- "a"
DT[eval(as.symbol(col1)) == 1]

Or to a less extent get (not recommended though as it can lead to envir issues if not properly used)

DT[get(col1) == 1]
# a
#1: 1
#2: 1

Another option is to make use of .SDcols

DT[DT[, .SD[[1]] == 1,.SDcols = col1]]
# a
#1: 1
#2: 1

How to select a column of a data.table via a variable

As indicated by David,

dt_txt_con <- textConnection(as.character(DT[, .SD[[var]]]))

is working like a charm.



Related Topics



Leave a reply



Submit