Add new columns to a data.table containing many variables
I think with a small modification to your last code you can easily do both for as many variables you want
vars <- c("x2", "x3") # <- Choose the variable you want to operate on
d[, paste0(vars, "_", "scale") := lapply(.SD, function(x) scale(x)[, 1]), .SDcols = vars, by = Stock]
d[, paste0(vars, "_", "sum") := lapply(.SD, sum), .SDcols = vars, by = Stock]
## Time Stock x1 x2 x3 x2_scale x3_scale x2_sum x3_sum
## 1: 2014-08-22 A 13 14 32 -1.1338934 1.1323092 87 44
## 2: 2014-08-23 A 25 39 9 0.7559289 -0.3701780 87 44
## 3: 2014-08-24 A 18 34 3 0.3779645 -0.7621312 87 44
## 4: 2014-08-22 B 44 8 6 -0.4730162 -0.7258662 59 32
## 5: 2014-08-23 B 49 3 18 -0.6757374 1.1406469 59 32
## 6: 2014-08-24 B 15 48 8 1.1487535 -0.4147807 59 32
For simple functions (that don't need special treatment like scale
) you could easily do something like
vars <- c("x2", "x3") # <- Define the variable you want to operate on
funs <- c("min", "max", "mean", "sum") # <- define your function
for(i in funs){
d[, paste0(vars, "_", i) := lapply(.SD, eval(i)), .SDcols = vars, by = Stock]
}
Dynamically create new columns in data.table
The last two statements return an error message:
dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(paste0(date_col, "_year", sep="") ="
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) ="
The correct syntax for calling the :=()
function is:
dat[, `:=`(paste0(date_col, "_year", sep = ""),
substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]
dat[, `:=`(noquote(paste0(date_col, "_year", sep = "")),
substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]
i.e., replace =
by ,
.
However, assignment syntax and right hand side are far too complicated.
The order_date
column is already of class Date
:
str(dat)
Classes ‘data.table’ and 'data.frame': 5 obs. of 3 variables:
$ one : int 1 2 3 4 5
$ two : int 1 2 3 4 5
$ order_date: Date, format: "2015-01-01" "2015-02-01" ...
- attr(*, ".internal.selfref")=<externalptr>
In order to extract the year, year()
function can be used (either from the data.table
package or the lubridate
package whatever is loaded last), so no conversion back to character and extraction of the year string is required:
date_col = "order_date"
dat[, paste0(date_col, "_year") := lapply(.SD, year), .SDcols = date_col][]
one two order_date order_date_year
1: 1 1 2015-01-01 2015
2: 2 2 2015-02-01 2015
3: 3 3 2015-03-01 2015
4: 4 4 2015-04-01 2015
5: 5 5 2015-05-01 2015
Alternatively,
dat[, paste0(date_col, "_year") := year(get(date_col))][]
dat[, `:=`(paste0(date_col, "_year"), year(get(date_col)))][]
work as well.
Assign multiple columns using := in data.table, by group
This now works in v1.8.3 on R-Forge. Thanks for highlighting it!
x <- data.table(a = 1:3, b = 1:6)
f <- function(x) {list("hi", "hello")}
x[ , c("col1", "col2") := f(), by = a][]
# a b col1 col2
# 1: 1 1 hi hello
# 2: 2 2 hi hello
# 3: 3 3 hi hello
# 4: 1 4 hi hello
# 5: 2 5 hi hello
# 6: 3 6 hi hello
x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][]
# a b col1 col2 mean sum
# 1: 1 1 hi hello 2.5 5
# 2: 2 2 hi hello 3.5 7
# 3: 3 3 hi hello 4.5 9
# 4: 1 4 hi hello 2.5 5
# 5: 2 5 hi hello 3.5 7
# 6: 3 6 hi hello 4.5 9
mynames = c("Name1", "Longer%")
x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a]
# a b col1 col2 mean sum Name1 Longer%
# 1: 1 1 hi hello 2.5 5 10 15
# 2: 2 2 hi hello 3.5 7 14 21
# 3: 3 3 hi hello 4.5 9 18 27
# 4: 1 4 hi hello 2.5 5 10 15
# 5: 2 5 hi hello 3.5 7 14 21
# 6: 3 6 hi hello 4.5 9 18 27
x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][] # same
# a b col1 col2 mean sum Name1 Longer%
# 1: 1 1 hi hello 2.5 5 10 15
# 2: 2 2 hi hello 3.5 7 14 21
# 3: 3 3 hi hello 4.5 9 18 27
# 4: 1 4 hi hello 2.5 5 10 15
# 5: 2 5 hi hello 3.5 7 14 21
# 6: 3 6 hi hello 4.5 9 18 27
x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][] # same
# a b col1 col2 mean sum Name1 Longer%
# 1: 1 1 hi hello 2.5 5 10 15
# 2: 2 2 hi hello 3.5 7 14 21
# 3: 3 3 hi hello 4.5 9 18 27
# 4: 1 4 hi hello 2.5 5 10 15
# 5: 2 5 hi hello 3.5 7 14 21
# 6: 3 6 hi hello 4.5 9 18 27
Older version using the with
argument (we discourage this argument when possible):
x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same
# a b col1 col2 mean sum Name1 Longer%
# 1: 1 1 hi hello 2.5 5 10 15
# 2: 2 2 hi hello 3.5 7 14 21
# 3: 3 3 hi hello 4.5 9 18 27
# 4: 1 4 hi hello 2.5 5 10 15
# 5: 2 5 hi hello 3.5 7 14 21
# 6: 3 6 hi hello 4.5 9 18 27
R data.table add new column with values from other columns by referencing
With fcase
:
cols <- unique(dt$Label)
dt[,newCol:=eval(parse(text=paste('fcase(',paste0("Label=='",cols,"',Col_",cols,collapse=','),')')))][]
Label Col_A Col_B Col_C newCol
<char> <num> <num> <num> <num>
1: A 2 1 2 2
2: B 3 4 0 4
3: C 5 3 4 4
4: A 0 5 1 0
5: B 2 2 5 2
6: C 7 0 6 6
7: A 6 7 7 6
8: B 8 5 3 5
9: C 9 8 0 0
assign new column using a variable in data.table
Consolidating some of the comments into an answer here...
From ?set
, you can find that the overhead of calling [.data.table
repeatedly can add up. In those cases, you can try set
instead.
Also, any set*
functions should be followed by []
to print the output.
With that, here are the two alternatives:
copy1 <- copy2 <- copy3 <- input_allele[,c(1:6)]
new <- colnames(input_allele[,.SD,.SDcols=c(7:11)])
## Using `set` :
for (i in new) {
set(copy1, j = i, value = 0)[]
}
head(copy1)
## FID IID PAT MAT SEX PHENOTYPE SNP1 SNP2 SNP3 SNP4 SNP5
## 1: gid1 IID11 1 0 1 1 0 0 0 0 0
## 2: gid2 IID12 2 0 1 1 0 0 0 0 0
## 3: gid3 IID13 3 0 1 1 0 0 0 0 0
## 4: gid4 IID14 4 0 1 1 0 0 0 0 0
## 5: gid5 IID15 5 0 1 1 0 0 0 0 0
## 6: gid6 IID16 6 0 1 1 0 0 0 0 0
## Using `:=` :
for (i in new) {
copy2[, (i) := 0][]
}
head(copy2)
## FID IID PAT MAT SEX PHENOTYPE SNP1 SNP2 SNP3 SNP4 SNP5
## 1: gid1 IID11 1 0 1 1 0 0 0 0 0
## 2: gid2 IID12 2 0 1 1 0 0 0 0 0
## 3: gid3 IID13 3 0 1 1 0 0 0 0 0
## 4: gid4 IID14 4 0 1 1 0 0 0 0 0
## 5: gid5 IID15 5 0 1 1 0 0 0 0 0
## 6: gid6 IID16 6 0 1 1 0 0 0 0 0
You could also avoid the loop:
copy3[, (new) := as.list(rep(0, length(new)))][]
## FID IID PAT MAT SEX PHENOTYPE SNP1 SNP2 SNP3 SNP4 SNP5
## 1: gid1 IID11 1 0 1 1 0 0 0 0 0
## 2: gid2 IID12 2 0 1 1 0 0 0 0 0
## 3: gid3 IID13 3 0 1 1 0 0 0 0 0
## 4: gid4 IID14 4 0 1 1 0 0 0 0 0
## 5: gid5 IID15 5 0 1 1 0 0 0 0 0
## 6: gid6 IID16 6 0 1 1 0 0 0 0 0
## 7: gid7 IID17 7 0 1 1 0 0 0 0 0
## 8: gid8 IID18 8 0 1 1 0 0 0 0 0
## 9: gid9 IID19 9 0 1 1 0 0 0 0 0
## 10: gid10 IID20 10 0 1 1 0 0 0 0 0
Note that quote
and eval
are not needed for these.
Even with this small dataset, the performance difference between set
and using :=
in a loop is measurable:
fun1 <- function() { for (i in new) { set(copy1, j = i, value = 0)[] }; copy1 }
fun2 <- function() { for (i in new) { copy2[, (i) := 0][] } ; copy2 }
fun3 <- function() copy3[, (new) := as.list(rep(0, length(new)))][]
bench::mark(fun1(), fun2(), fun3())
## # A tibble: 3 x 13
## expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
## <bch:expr> <bch:t> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
## 1 fun1() 64.9µs 69.63µs 13932. 0B 4.17 6689 2
## 2 fun2() 993µs 1.07ms 910. 377.6KB 4.23 430 2
## 3 fun3() 241.9µs 255.12µs 3793. 16.4KB 4.30 1763 2
## # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
## # time <list>, gc <list>
Create multiple variables in data.table based other variables names
This first part is mostly defensive, guarding against: a*
variables without matching b*
variables; vice versa; and different order of each:
anames <- grep("^a[0-9]+$", colnames(DT), value = TRUE)
bnames <- grep("^b[0-9]+$", colnames(DT), value = TRUE)
numnames <- gsub("^a", "", anames)
anames <- sort(anames[gsub("^a", "", anames) %in% numnames])
bnames <- sort(bnames[gsub("^b", "", bnames) %in% numnames])
cnames <- gsub("^b", "c", bnames)
If you know the number ranges a priori and want something less-dynamic but more straight-forward, then
anames <- paste0("a", 1:3)
bnames <- paste0("b", 1:3)
cnames <- paste0("c", 1:3)
Now the magic:
DT[, (cnames) := Map(`+`, mget(anames), mget(bnames)) ]
DT
# a1 a2 a3 b1 b2 b3 c1 c2 c3
# 1: 1 2 3 5 20 25 6 22 28
# 2: 2 4 6 6 24 30 8 28 36
# 3: 3 6 9 7 28 35 10 34 44
data.table adding column to several data tables in loop
The best way is to store your objects in a list and then access them by indexing the list.
library(data.table)
my_dts <- list (
centro = data.table(x=letters[1:5], y = 1:5),
sur = data.table(x=letters[2:7], y = 3:8),
norte = data.table(x=letters[2:10], y = 3:11)
)
st <- c("centro", "sur", "norte")
for (i in st) my_dts[[i]][, store := i]
Related Topics
Controlling the 'Alpha' Level in a Ggplot2 Legend
Add Author Affiliation in R Markdown Beamer Presentation
How to Build a Dendrogram from a Directory Tree
Knitr: Run All Chunks in an Rmarkdown Document
Make Dataframe of Top N Frequent Terms for Multiple Corpora Using Tm Package in R
Putting X-Axis at Top of Ggplot2 Chart
Extract Random Effect Variances from Lme4 Mer Model Object
Remove Spacing Around Plotting Area in R
Differences in Heatmap/Clustering Defaults in R (Heatplot Versus Heatmap.2)
Piecewise Regression with R: Plotting the Segments
Adding Custom Image to Geom_Polygon Fill in Ggplot
How to Save a Data Frame as CSV to a User Selected Location Using Tcltk
Options for Deploying R Models in Production
"Un-Register" a Doparallel Cluster
R: How to Get the Last Element from Each Group
Dplyr::Select One Column and Output as Vector
How to Interpret Lm() Coefficient Estimates When Using Bs() Function for Splines