Pass column name in data.table using variable
Use the quote()
and eval()
functions to pass a variable to j
. You don't need double-quotes on the column names when you do it this way, because the quote()
-ed string will be evaluated inside the DT[]
temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"
With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form
temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933
How do I pass column name as variable to data.table in R?
the comment from @thelatemail is a very good start. Do read that first! Another quick way is below
library(data.table)
df = data.table(a=1:10, b=letters[1:2], c=11:20)
var1="a"
var2="b"
dt1=df[,c(var1,var2), with=F]
Think of "with=F" as making "j" part data.table behave like that of data.frame
Edit 1 : to subset on a condition within a datatable
df[get(var1) > 5, c(var1, var2),with = F]
Passing Column Name as Parameter to data.table::setkey() --- some columns are not in the data.table: col_name
Not a data.table expert but ?setkey
says :
setkey(x, ..., verbose=getOption("datatable.verbose"), physical = TRUE)
... - The columns to sort by. Do not quote the column names.
which means you cannot pass quoted column names here.
You can use setkeyv
:
setkeyv(x, cols, verbose=getOption("datatable.verbose"), physical = TRUE)
cols - A character vector of column names
genericFunction <- function(data, col_name, filter){
#Convert data.frame to data.table
data <- data.table::as.data.table(data)
data <- data.table::setkeyv(data, col_name)
#Save the subset of data
matches <- data[c(filter)]
return(matches)
}
exampleData <- data.frame(A = c(1, 2, 3), B = c("one", "two", "three"))
exampleName <- "A"
exampleFilter <- 1
genericFunction(exampleData, exampleName, exampleFilter)
# A B
#1: 1 one
Pass/loop over column names to data.table and ggplot as variables
try to use get(ZZZ)
in loop body, instead of ZZZ
Updating column using data.table when using global variable as column name
On the lhs
, we can wrap with ()
to evaluate the value and either use get
or specify it in .SDcols
dt[, (colname) := ifelse(.SD[[1]] ==allele, 2, 1), .SDcols = colname]
It is not clear whether the allele
is another column or an object created with some value
Using a variable to specify a column name within `data.table`
Data:
library(data.table)
dt = data.table(col1=letters[1:2], x=c('1','2'))
One solution is to use quote
and the eval
in your data.table
:
y = quote(x)
dt[,eval(y):=as.numeric(eval(y))]
#> is.numeric(dt$x)
#[1] TRUE
Referring to data.table columns by names saved in variables
If you are going to be doing complicated operations inside your j
expressions, you should probably use eval
and quote
. One problem with that in current version of data.table
is that the environment of eval
is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD
to eval
. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1]
in j
would).
Interestingly this issue only plagues the j
and you'll be fine using eval
normally in i
(where .SD
is not available anyway).
The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:
x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)
x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]
Note how I was ok not adding .SD
in one eval(distcol)
, but won't be if I take it out of the other eval
.
Another option is to use get
:
diststr = "dist"
valstr = "val"
x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]
Passing multiple column names to by in a data.table function
Just create a character vector for by
part of data.table
, it will work:
myFun <- function(df, i, j, by){
df[get(i) == 4, .(Count = .N,
Mean = mean(get(j)),
Median = median(get(j))),
by = c(by, 'am')]
}
myFun(dt, i = 'cyl', j = 'hp', by = 'vs')
#vs am Count Mean Median
#1: 1 1 7 80.57143 66
#2: 1 0 3 84.66667 95
#3: 0 1 1 91.00000 91
R - Pass column names into data.table formula - difference between get and eval
This example shows the difference between how eval
and get
differ in function. Using a data.table
object is not needed to show what each does.
iVec <- c(123, 456)
iVarName <- "iVec"
# Returns the contents of 'iVarName' (a string). This happens
# to be the name of a variable but doesn't have to.
eval(iVarName)
##> [1] "iVec"
# Returns the contents of what 'iVarName' refers to (it
# refers to the variable "iVec" in this case, which
# is a variable which contains a vector of integers).
get(iVarName)
##> [1] 123 456
### #########################################
### Similar to above but where the variable
### 'iVec2' does not exist.
### #########################################
rm(iVec2)
# The variable "iVec2" does not exist.
iVarName2 <- 'iVec2'
# Returns the contents of 'iVarName2' (a string). This is not
# the name of an existing variable in this context.
eval(iVarName2)
## [1] "iVec2"
get(iVarName2) # Returns an error because 'iVec2' doesn't exist.
## Error in get(iVarName2) : object 'iVec2' not found
Since this question is more about eval
vs. get
, I will leave the data.table
specifics out. The way data.table
handles strings and variable names is very likely answered in a different SO post.
Related Topics
Expand Rows by Date Range Using Start and End Date
Plotting Contours on an Irregular Grid
How to Flatten/Merge Overlapping Time Periods
Dplyr::Select Function Clashes With Mass::Select
Idiomatic R Code For Partitioning a Vector by an Index and Performing an Operation on That Partition
Frequency Count of Two Column in R
Dplyr Summarise: Equivalent of ".Drop=False" to Keep Groups With Zero Length in Output
How to Calculate the Co-Occurrence in the Table
Dplyr: How to Use Group_By Inside a Function
Ggplot2: Histogram With Normal Curve
Error: '\R' Is an Unrecognized Escape in Character String Starting "C:\R"
Convert Comma Separated String to Numeric Columns
How to Add Code Folding to Output Chunks in Rmarkdown HTML Documents