data.table := assignments when variable has same name as a column
You can always use get
, which allows you to specify the environment:
dt1[1, a := get("a", envir = .GlobalEnv)]
# a
#1: 18
Or just:
a <- 42
dt1[1, a := .GlobalEnv$a]
# a
#1: 42
Subsetting a data.table with a variable (when varname identical to colname)
If you don't mind doing it in 2 steps, you can just subset out of the scope of your data.table
(though it's usually not what you want to do when working with data.table...):
wh_v1 <- my_data_table[, V1]==V1
my_data_table[wh_v1]
# V1 V2
#1: A 1
#2: A 4
Disambiguating a variable name in a function when a column with the same name as the variable exists (data.table)
With data.table
development version (1.14.3), this can be done with the new env
argument, see programming on data.table:
data.table::update.dev.pkg()
source = "idref"
corpus[source=="027021335",env=list(source=source)]
idref iddoc nom prenom order role Annee_soutenance source time_variable
1: 027021335 97466 Méhaut Philippe 0 supervisor 2011 as.character(idref) as.character(idref)
R data.table use variable name for assignment in group by
Either use setNames
wrapped around the list
(.(mean(xa))
) column or
dt[, setNames(.(mean(xa)), cn), by = g]
# g sa
#1: 1 0.2010599
#2: 2 0.4710056
#3: 3 0.4871248
or the setnames
after getting the summarised output
setnames(dt[, mean(xa), by = g], 'V1', cn)[]
In data.table
, :=
operator is used for creating/modifying a column in the original dataset. But, this operator is different when used in the tidyverse
context
library(dplyr)
dt %>%
group_by(g) %>%
summarise(!! cn := mean(xa), .groups = 'drop')
# A tibble: 3 x 2
# g sa
# <int> <dbl>
#1 1 0.201
#2 2 0.471
#3 3 0.487
How to assign dynamic column names in data.table under `:=`?
We can place the values in a list
or use .(...)
and then assign (:=
) it to new columns
carsDT[speed < 15, paste0("col", 1:2) := list(1, 2)]
Using a variable to specify a column name within `data.table`
Data:
library(data.table)
dt = data.table(col1=letters[1:2], x=c('1','2'))
One solution is to use quote
and the eval
in your data.table
:
y = quote(x)
dt[,eval(y):=as.numeric(eval(y))]
#> is.numeric(dt$x)
#[1] TRUE
Select / assign to data.table when variable names are stored in a character vector
Two ways to programmatically select variable(s):
with = FALSE
:DT = data.table(col1 = 1:3)
colname = "col1"
DT[, colname, with = FALSE]
# col1
# 1: 1
# 2: 2
# 3: 3'dot dot' (
..
) prefix:DT[, ..colname]
# col1
# 1: 1
# 2: 2
# 3: 3
For further description of the 'dot dot' (..
) notation, see New Features in 1.10.2 (it is currently not described in help text).
To assign to variable(s), wrap the LHS of :=
in parentheses:
DT[, (colname) := 4:6]
# col1
# 1: 4
# 2: 5
# 3: 6
The latter is known as a column plonk, because you replace the whole column vector by reference. If a subset i
was present, it would subassign by reference. The parens around (colname)
is a shorthand introduced in version v1.9.4 on CRAN Oct 2014. Here is the news item:
Using
with = FALSE
with:=
is now deprecated in all cases, given that wrapping
the LHS of:=
with parentheses has been preferred for some time.
colVar = "col1"
DT[, (colVar) := 1] # please change to this
DT[, c("col1", "col2") := 1] # no change
DT[, 2:4 := 1] # no change
DT[, c("col1","col2") := list(sum(a), mean(b))] # no change
DT[, `:=`(...), by = ...] # no change
See also Details section in ?`:=`
:
DT[i, (colnamevector) := value]
# [...] The parens are enough to stop the LHS being a symbol
And to answer further question in comment, here's one way (as usual there are many ways) :
DT[, colname := cumsum(get(colname)), with = FALSE]
# col1
# 1: 4
# 2: 9
# 3: 15
or, you might find it easier to read, write and debug just to eval
a paste
, similar to constructing a dynamic SQL statement to send to a server :
expr = paste0("DT[,",colname,":=cumsum(",colname,")]")
expr
# [1] "DT[,col1:=cumsum(col1)]"
eval(parse(text=expr))
# col1
# 1: 4
# 2: 13
# 3: 28
If you do that a lot, you can define a helper function EVAL
:
EVAL = function(...)eval(parse(text=paste0(...)),envir=parent.frame(2))
EVAL("DT[,",colname,":=cumsum(",colname,")]")
# col1
# 1: 4
# 2: 17
# 3: 45
Now that data.table
1.8.2 automatically optimizes j
for efficiency, it may be preferable to use the eval
method. The get()
in j
prevents some optimizations, for example.
Or, there is set()
. A low overhead, functional form of :=
, which would be fine here. See ?set
.
set(DT, j = colname, value = cumsum(DT[[colname]]))
DT
# col1
# 1: 4
# 2: 21
# 3: 66
Related Topics
Significance Level Added to Matrix Correlation Heatmap Using Ggplot2
R Define Dimensions of Empty Data Frame
Replacing White Space with One Single Backslash
Write.Table Writes Unwanted Leading Empty Column to Header When Has Rownames
Counting Non-Blank Cells for Selected Columns
What Are Examples of When Seq_Along Works, But Seq Produces Unintended Results
R Lpsolve Binary Find All Possible Solutions
How to Tell When My Dataset in R Is Going to Be Too Large
Using User-Defined "For Loop" Function to Construct a Data Frame
Flattening a Delimited Composite Column
Adding a 3Rd Order Polynomial and Its Equation to a Ggplot in R
Add Column to Data Frame Which Returns 1 If String Match a Certain Pattern
Given Start Date and End Date, Reshape/Expand Data for Each Day Between (Each Day on a Row)