Pass Column Name in Data.Table Using Variable

Pass column name in data.table using variable

Use the quote() and eval() functions to pass a variable to j. You don't need double-quotes on the column names when you do it this way, because the quote()-ed string will be evaluated inside the DT[]

temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"

With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form

temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933

How do I pass column name as variable to data.table in R?

the comment from @thelatemail is a very good start. Do read that first! Another quick way is below

library(data.table)
df = data.table(a=1:10, b=letters[1:2], c=11:20)

var1="a"
var2="b"

dt1=df[,c(var1,var2), with=F]

Think of "with=F" as making "j" part data.table behave like that of data.frame

Edit 1 : to subset on a condition within a datatable

df[get(var1) > 5, c(var1, var2),with = F]

Passing Column Name as Parameter to data.table::setkey() --- some columns are not in the data.table: col_name

Not a data.table expert but ?setkey says :

setkey(x, ..., verbose=getOption("datatable.verbose"), physical = TRUE)

... - The columns to sort by. Do not quote the column names.

which means you cannot pass quoted column names here.

You can use setkeyv :

setkeyv(x, cols, verbose=getOption("datatable.verbose"), physical = TRUE)

cols - A character vector of column names
genericFunction <- function(data, col_name, filter){
#Convert data.frame to data.table
data <- data.table::as.data.table(data)

data <- data.table::setkeyv(data, col_name)

#Save the subset of data
matches <- data[c(filter)]

return(matches)
}

exampleData <- data.frame(A = c(1, 2, 3), B = c("one", "two", "three"))
exampleName <- "A"
exampleFilter <- 1

genericFunction(exampleData, exampleName, exampleFilter)

# A B
#1: 1 one

Pass/loop over column names to data.table and ggplot as variables

try to use get(ZZZ) in loop body, instead of ZZZ

Updating column using data.table when using global variable as column name

On the lhs, we can wrap with () to evaluate the value and either use get or specify it in .SDcols

dt[, (colname) := ifelse(.SD[[1]] ==allele, 2, 1), .SDcols = colname]

It is not clear whether the allele is another column or an object created with some value

Using a variable to specify a column name within `data.table`

Data:

library(data.table)
dt = data.table(col1=letters[1:2], x=c('1','2'))

One solution is to use quote and the eval in your data.table:

y = quote(x)
dt[,eval(y):=as.numeric(eval(y))]

#> is.numeric(dt$x)
#[1] TRUE

Referring to data.table columns by names saved in variables

If you are going to be doing complicated operations inside your j expressions, you should probably use eval and quote. One problem with that in current version of data.table is that the environment of eval is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD to eval. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1] in j would).

Interestingly this issue only plagues the j and you'll be fine using eval normally in i (where .SD is not available anyway).

The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:

x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)

x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]

Note how I was ok not adding .SD in one eval(distcol), but won't be if I take it out of the other eval.

Another option is to use get:

diststr = "dist"
valstr = "val"

x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]

Passing multiple column names to by in a data.table function

Just create a character vector for by part of data.table, it will work:

myFun <- function(df, i, j, by){

df[get(i) == 4, .(Count = .N,
Mean = mean(get(j)),
Median = median(get(j))),
by = c(by, 'am')]
}



myFun(dt, i = 'cyl', j = 'hp', by = 'vs')

#vs am Count Mean Median
#1: 1 1 7 80.57143 66
#2: 1 0 3 84.66667 95
#3: 0 1 1 91.00000 91

R - Pass column names into data.table formula - difference between get and eval

This example shows the difference between how eval and get differ in function. Using a data.table object is not needed to show what each does.

iVec     <- c(123, 456)
iVarName <- "iVec"

# Returns the contents of 'iVarName' (a string). This happens
# to be the name of a variable but doesn't have to.
eval(iVarName)
##> [1] "iVec"

# Returns the contents of what 'iVarName' refers to (it
# refers to the variable "iVec" in this case, which
# is a variable which contains a vector of integers).
get(iVarName)
##> [1] 123 456

### #########################################
### Similar to above but where the variable
### 'iVec2' does not exist.
### #########################################
rm(iVec2)
# The variable "iVec2" does not exist.
iVarName2 <- 'iVec2'

# Returns the contents of 'iVarName2' (a string). This is not
# the name of an existing variable in this context.
eval(iVarName2)
## [1] "iVec2"
get(iVarName2) # Returns an error because 'iVec2' doesn't exist.
## Error in get(iVarName2) : object 'iVec2' not found

Since this question is more about eval vs. get, I will leave the data.table specifics out. The way data.table handles strings and variable names is very likely answered in a different SO post.



Related Topics



Leave a reply



Submit