Referring to Data.Table Columns by Names Saved in Variables

Referring to data.table columns by names saved in variables

If you are going to be doing complicated operations inside your j expressions, you should probably use eval and quote. One problem with that in current version of data.table is that the environment of eval is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD to eval. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1] in j would).

Interestingly this issue only plagues the j and you'll be fine using eval normally in i (where .SD is not available anyway).

The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:

x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)

x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]

Note how I was ok not adding .SD in one eval(distcol), but won't be if I take it out of the other eval.

Another option is to use get:

diststr = "dist"
valstr = "val"

x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]

Join datatables using column names stored in variables

Why not do it all in one go without a loop?

A possible solution:

melt(dt1, id = 1)[dt2, on = .(variable = name, value > valueMin, value <= valueMax), lbl := i.label
][, dcast(.SD, id ~ variable, value.var = c("value","lbl"))]

which gives:

   id value_value1 value_value2 value_value3 lbl_value1 lbl_value2 lbl_value3
1: 1 11 21 36 101 NA NA
2: 2 12 22 37 101 201 301
3: 3 13 23 38 101 201 301
4: 4 14 24 39 102 201 302
5: 5 15 25 40 103 202 302

Reference `data.table` column by name

It sounds like you're looking for get:

DT = data.table(a=1, b=2, "a+b"=8)
col = "a+b"
DT[, get(col) * 2]
# [1] 16
DT[, c := get(col) * 2]
DT
# a b a+b c
# 1: 1 2 8 16

Pass column name in data.table using variable

Use the quote() and eval() functions to pass a variable to j. You don't need double-quotes on the column names when you do it this way, because the quote()-ed string will be evaluated inside the DT[]

temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"

With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form

temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933

R data.table scoping, reliably refer to unknown column name using variable

Proper solution for these kind of problems has been recently implemented in data.table. There is new env argument which does not have a local-data.table scoping. Users no longer need to use get.

library(data.table)
dt=data.table(id='id1')
id='id'
dt[.id %in% 'id1', env=list(.id=id)]
# id
# <char>
#1: id1

Because it is not on CRAN as of now, you need to install data.table from our CRAN-like repo. Note that we publish windows binaries as well, so Rtools is not necessary.
Most simple way to install from our repo is:

data.table::update.dev.pkg()

R: data.table, aggregate data frame by column names stored in variables

If you're willing to consider dplyr instead of data.table, here's a solution using tidy evaluation.

library(dplyr)
idvar <- "Species"
valvar <- "Sepal.Width"

iris %>%
group_by(!!sym(idvar)) %>%
summarise(Mean.value = mean(!!sym(valvar)))

# A tibble: 3 x 2
Species Mean.value
<fct> <dbl>
1 setosa 3.43
2 versicolor 2.77
3 virginica 2.97

Using a variable to specify a column name within `data.table`

Data:

library(data.table)
dt = data.table(col1=letters[1:2], x=c('1','2'))

One solution is to use quote and the eval in your data.table:

y = quote(x)
dt[,eval(y):=as.numeric(eval(y))]

#> is.numeric(dt$x)
#[1] TRUE


Related Topics



Leave a reply



Submit