Referring to data.table columns by names saved in variables
If you are going to be doing complicated operations inside your j
expressions, you should probably use eval
and quote
. One problem with that in current version of data.table
is that the environment of eval
is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD
to eval
. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1]
in j
would).
Interestingly this issue only plagues the j
and you'll be fine using eval
normally in i
(where .SD
is not available anyway).
The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:
x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)
x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]
Note how I was ok not adding .SD
in one eval(distcol)
, but won't be if I take it out of the other eval
.
Another option is to use get
:
diststr = "dist"
valstr = "val"
x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]
Join datatables using column names stored in variables
Why not do it all in one go without a loop?
A possible solution:
melt(dt1, id = 1)[dt2, on = .(variable = name, value > valueMin, value <= valueMax), lbl := i.label
][, dcast(.SD, id ~ variable, value.var = c("value","lbl"))]
which gives:
id value_value1 value_value2 value_value3 lbl_value1 lbl_value2 lbl_value3
1: 1 11 21 36 101 NA NA
2: 2 12 22 37 101 201 301
3: 3 13 23 38 101 201 301
4: 4 14 24 39 102 201 302
5: 5 15 25 40 103 202 302
Reference `data.table` column by name
It sounds like you're looking for get
:
DT = data.table(a=1, b=2, "a+b"=8)
col = "a+b"
DT[, get(col) * 2]
# [1] 16
DT[, c := get(col) * 2]
DT
# a b a+b c
# 1: 1 2 8 16
Pass column name in data.table using variable
Use the quote()
and eval()
functions to pass a variable to j
. You don't need double-quotes on the column names when you do it this way, because the quote()
-ed string will be evaluated inside the DT[]
temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"
With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form
temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933
R data.table scoping, reliably refer to unknown column name using variable
Proper solution for these kind of problems has been recently implemented in data.table. There is new env
argument which does not have a local-data.table scoping. Users no longer need to use get
.
library(data.table)
dt=data.table(id='id1')
id='id'
dt[.id %in% 'id1', env=list(.id=id)]
# id
# <char>
#1: id1
Because it is not on CRAN as of now, you need to install data.table from our CRAN-like repo. Note that we publish windows binaries as well, so Rtools is not necessary.
Most simple way to install from our repo is:
data.table::update.dev.pkg()
R: data.table, aggregate data frame by column names stored in variables
If you're willing to consider dplyr
instead of data.table
, here's a solution using tidy evaluation.
library(dplyr)
idvar <- "Species"
valvar <- "Sepal.Width"
iris %>%
group_by(!!sym(idvar)) %>%
summarise(Mean.value = mean(!!sym(valvar)))
# A tibble: 3 x 2
Species Mean.value
<fct> <dbl>
1 setosa 3.43
2 versicolor 2.77
3 virginica 2.97
Using a variable to specify a column name within `data.table`
Data:
library(data.table)
dt = data.table(col1=letters[1:2], x=c('1','2'))
One solution is to use quote
and the eval
in your data.table
:
y = quote(x)
dt[,eval(y):=as.numeric(eval(y))]
#> is.numeric(dt$x)
#[1] TRUE
Related Topics
Select Columns Based on String Match - Dplyr::Select
Is There a Way of Manipulating Ggplot Scale Breaks and Labels
Different Legends and Fill Colours for Facetted Ggplot
Find Start and End Positions/Indices of Runs/Consecutive Values
Factors in R: More Than an Annoyance
How to Add Frequency Count Labels to the Bars in a Bar Graph Using Ggplot2
Creating a Symmetric Matrix in R
Render Dropdown for Single Column in Dt Shiny
Techniques for Finding Near Duplicate Records
Common Legend for Multiple Plots in R
Marker Mouse Click Event in R Leaflet for Shiny
Apply a Function Over Groups of Columns
R - Emulate the Default Behavior of Hist() with Ggplot2 for Bin Width
Is There a Built-In Way to Do a Logarithmic Color Scale in Ggplot2
R: Ggplot2, How to Set the Plot Title to Wrap Around and Shrink the Text to Fit the Plot