Assigning/Referencing a column name in data.table dynamically (in i, j and by)
Edit:
Based on your clarifications, here is an approach with setNames
and get
. The trick here is that ..
instructs the evaluation to occur in the calling environment.
library(data.table)
cars <- data.table(cars)
strFactor <- "dist"
strNewVariable <- "Totals x Factor: "
strBy <- "speed"
cars[ get(strFactor) > 50,
setNames(.(.N * get(..strFactor)),strNewVariable),
by=strBy]
Dynamic column names in data.table
From data.table 1.9.4
, you can just do this:
## A parenthesized symbol, `(cn)`, gets evaluated to "blah" before `:=` is carried out
test_dtb[, (cn) := mean(a), by = id]
head(test_dtb, 4)
# a b id blah
# 1: 41 19 1 54.2
# 2: 4 99 2 50.0
# 3: 49 85 3 46.7
# 4: 61 4 4 57.1
See Details in ?:=
:
DT[i, (colvector) := val]
[...] NOW PREFERRED [...] syntax. The parens are enough to stop the LHS being a symbol; same as
c(colvector)
Original answer:
You were on exactly the right track: constructing an expression to be evaluated within the call to [.data.table
is the data.table way to do this sort of thing. Going just a bit further, why not construct an expression that evaluates to the entire j
argument (rather than just its left hand side)?
Something like this should do the trick:
## Your code so far
library(data.table)
test_dtb <- data.table(a=sample(1:100, 100),b=sample(1:100, 100),id=rep(1:10,10))
cn <- "blah"
## One solution
expr <- parse(text = paste0(cn, ":=mean(a)"))
test_dtb[,eval(expr), by=id]
## Checking the result
head(test_dtb, 4)
# a b id blah
# 1: 30 26 1 38.4
# 2: 83 82 2 47.4
# 3: 47 66 3 39.5
# 4: 87 23 4 65.2
Dynamically add column names to data.table when aggregating
As mentioned in the comments by lukeA, setNames
can be used:
m <- c("blah", "foo")
test_dtb[ , setNames(list(mean(b), median(b)), m), by = id]
Pass column name in data.table using variable
Use the quote()
and eval()
functions to pass a variable to j
. You don't need double-quotes on the column names when you do it this way, because the quote()
-ed string will be evaluated inside the DT[]
temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"
With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form
temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933
Referring to data.table columns by names saved in variables
If you are going to be doing complicated operations inside your j
expressions, you should probably use eval
and quote
. One problem with that in current version of data.table
is that the environment of eval
is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD
to eval
. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1]
in j
would).
Interestingly this issue only plagues the j
and you'll be fine using eval
normally in i
(where .SD
is not available anyway).
The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:
x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)
x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]
Note how I was ok not adding .SD
in one eval(distcol)
, but won't be if I take it out of the other eval
.
Another option is to use get
:
diststr = "dist"
valstr = "val"
x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]
dynamic column names seem to work when := is used but not when = is used in data.table
One option is to use the base R function setNames
aggregate_mtcars <- mtcars_copy[, setNames(.(sum(carb)), new_col)]
Or you could use data.table::setnames
aggregate_mtcars <- setnames(mtcars_copy[, .(sum(carb))], new_col)
R data.table struggling with conditional subsetting when column name is predefined elsewhere
I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:
DT[get(column_name) == 1,]
x y
1: 1 0
2: 1 1
If you need to use column_name
in J
, you can use get(..column_name)
:
DT[,get(..column_name)]
[1] 1 1 0 0
The ..
instructs evaluation to occur in the parent environment.
Another approach for using a string in either I
or J
is with eval(as.name(column_name))
:
DT[eval(as.name(column_name)) == 1]
x y
1: 1 0
2: 1 1
DT[,eval(as.name(column_name))]
[1] 1 1 0 0
Related Topics
Error Trying to Read a PDF Using Readpdf from The Tm Package
How to Remove Rows with Nas Only If They Are Present in More Than Certain Percentage of Columns
Flag First By-Group in R Data Frame
How to Uninstall R Completely from Os X
Creating a Stacked Bar Chart Centered on Zero Using Ggplot
Disable Gui, Graphics Devices in R
Rstudio Viewer Pane Not Working
Change Position of Tick Marks of a Single Graph, Using Ggplot2
Column Name with Brackets or Other Punctuations for Dplyr Group_By
R Shiny: How to Change The Background Color of The Header
How to Get Proportions and Counts of a Data Frame in R
How to Set The Maximum Recursion Depth In
Download Multiple CSV Files with One Button (Downloadhandler) with R Shiny