creating, directly, data.tables with column names from variables, and using variables for column names with :=
For the first question, I'm not absolutely sure, but you may want to try and see if fread
is of any help creating an empty data.table with named columns.
As for the second question, try
DT[, c(nameOfCols) := 10]
Where nameOfCols
is the vector with names of the columns you want to modify. See ?data.table
Updating column using data.table when using global variable as column name
On the lhs
, we can wrap with ()
to evaluate the value and either use get
or specify it in .SDcols
dt[, (colname) := ifelse(.SD[[1]] ==allele, 2, 1), .SDcols = colname]
It is not clear whether the allele
is another column or an object created with some value
Passing multiple column names to by in a data.table function
Just create a character vector for by
part of data.table
, it will work:
myFun <- function(df, i, j, by){
df[get(i) == 4, .(Count = .N,
Mean = mean(get(j)),
Median = median(get(j))),
by = c(by, 'am')]
}
myFun(dt, i = 'cyl', j = 'hp', by = 'vs')
#vs am Count Mean Median
#1: 1 1 7 80.57143 66
#2: 1 0 3 84.66667 95
#3: 0 1 1 91.00000 91
R data.table scoping, reliably refer to unknown column name using variable
Proper solution for these kind of problems has been recently implemented in data.table. There is new env
argument which does not have a local-data.table scoping. Users no longer need to use get
.
library(data.table)
dt=data.table(id='id1')
id='id'
dt[.id %in% 'id1', env=list(.id=id)]
# id
# <char>
#1: id1
Because it is not on CRAN as of now, you need to install data.table from our CRAN-like repo. Note that we publish windows binaries as well, so Rtools is not necessary.
Most simple way to install from our repo is:
data.table::update.dev.pkg()
Pass column name in data.table using variable
Use the quote()
and eval()
functions to pass a variable to j
. You don't need double-quotes on the column names when you do it this way, because the quote()
-ed string will be evaluated inside the DT[]
temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"
With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form
temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933
How do I reference a function parameter inside inside a data.table with a column of the same name?
- One possible option is this:
myfunc <- function(dt, t){
env <- environment()
dt <- dt[t==get('t',env)]
mean(dt$b)
}
- Another approach: while perhaps not strictly a "solution" to your current problem, you may find it of interest. Consider
data.table
version>= 1.14.3. In this case, we can useenv
param ofDT[i,j,by,env,...]
, to indicate the datatable column as"t"
, and the function parameter ast
. Notice that this will work on columnt
with function parametert
, even ifdt
contains columns namedcol
andval
myfunc <- function(dt, t){
dt <- dt[col==val, env=list(col="t", val=t)]
mean(dt$b)
}
In both case, usage and output is as below:
Usage
myfunc(dt = foo, t = 3)
Output:
[1] 0.1292877
Input:
set.seed(123)
foo <- data.table(t = c(1,1,2,2,3), b = rnorm(5))
foo
looks like this:
> foo
t b
1: 1 -0.56047565
2: 1 -0.23017749
3: 2 1.55870831
4: 2 0.07050839
5: 3 0.12928774
Related Topics
Assign Multiple Columns Using := in Data.Table, by Group
Limit Ggplot2 Axes Without Removing Data (Outside Limits): Zoom
Filter Data Frame by Character Column Name (In Dplyr)
Why Is Rbindlist "Better" Than Rbind
Dplyr Filter: Get Rows With Minimum of Variable, But Only the First If Multiple Minima
Format Numbers With Million (M) and Billion (B) Suffixes
How to Put a Transformed Scale on the Right Side of a Ggplot2
How to Load Packages in R Automatically
Using the Rjava Package on Win7 64 Bit With R
Conditionally Change Panel Background With Facet_Grid
Labeling Outliers of Boxplots in R
Yaml Current Date in Rmarkdown
How to Get Week Numbers from Dates
Creating Arbitrary Panes in Ggplot2