How Can One Work Fully Generically in Data.Table in R With Column Names in Variables

creating, directly, data.tables with column names from variables, and using variables for column names with :=

For the first question, I'm not absolutely sure, but you may want to try and see if fread is of any help creating an empty data.table with named columns.

As for the second question, try

DT[, c(nameOfCols) := 10]

Where nameOfCols is the vector with names of the columns you want to modify. See ?data.table

Updating column using data.table when using global variable as column name

On the lhs, we can wrap with () to evaluate the value and either use get or specify it in .SDcols

dt[, (colname) := ifelse(.SD[[1]] ==allele, 2, 1), .SDcols = colname]

It is not clear whether the allele is another column or an object created with some value

Passing multiple column names to by in a data.table function

Just create a character vector for by part of data.table, it will work:

myFun <- function(df, i, j, by){

df[get(i) == 4, .(Count = .N,
Mean = mean(get(j)),
Median = median(get(j))),
by = c(by, 'am')]
}



myFun(dt, i = 'cyl', j = 'hp', by = 'vs')

#vs am Count Mean Median
#1: 1 1 7 80.57143 66
#2: 1 0 3 84.66667 95
#3: 0 1 1 91.00000 91

R data.table scoping, reliably refer to unknown column name using variable

Proper solution for these kind of problems has been recently implemented in data.table. There is new env argument which does not have a local-data.table scoping. Users no longer need to use get.

library(data.table)
dt=data.table(id='id1')
id='id'
dt[.id %in% 'id1', env=list(.id=id)]
# id
# <char>
#1: id1

Because it is not on CRAN as of now, you need to install data.table from our CRAN-like repo. Note that we publish windows binaries as well, so Rtools is not necessary.
Most simple way to install from our repo is:

data.table::update.dev.pkg()

Pass column name in data.table using variable

Use the quote() and eval() functions to pass a variable to j. You don't need double-quotes on the column names when you do it this way, because the quote()-ed string will be evaluated inside the DT[]

temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"

With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form

temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933

How do I reference a function parameter inside inside a data.table with a column of the same name?

  1. One possible option is this:
myfunc <- function(dt, t){
env <- environment()
dt <- dt[t==get('t',env)]
mean(dt$b)
}

  1. Another approach: while perhaps not strictly a "solution" to your current problem, you may find it of interest. Consider data.table version>= 1.14.3. In this case, we can use env param of DT[i,j,by,env,...], to indicate the datatable column as "t", and the function parameter as t. Notice that this will work on column t with function parameter t, even if dt contains columns named col and val
myfunc <- function(dt, t){
dt <- dt[col==val, env=list(col="t", val=t)]
mean(dt$b)
}

In both case, usage and output is as below:

Usage

myfunc(dt = foo, t = 3)

Output:

[1] 0.1292877

Input:

set.seed(123)
foo <- data.table(t = c(1,1,2,2,3), b = rnorm(5))

foo looks like this:

> foo
t b
1: 1 -0.56047565
2: 1 -0.23017749
3: 2 1.55870831
4: 2 0.07050839
5: 3 0.12928774


Related Topics



Leave a reply



Submit