Dynamically add column names to data.table when aggregating
As mentioned in the comments by lukeA, setNames
can be used:
m <- c("blah", "foo")
test_dtb[ , setNames(list(mean(b), median(b)), m), by = id]
dynamic aggregations with data.table in R using derived column names
You can achieve it using .SDcols
argument. See example.
require(data.table)
dt <- data.table(ch=c('a','b','c'), num1=c(1,3,6), num2=1:9)
DoSomething <- function(dt) {
numCols <- names(dt)[sapply(dt, is.numeric)]
chrCols <- names(dt)[sapply(dt, is.character)]
dt[, list(sum(.SD[[1]]), mean(.SD[[2]])), by = chrCols, .SDcols = numCols]
}
DoSomething(dt)
How to assign dynamic column names in data.table under `:=`?
We can place the values in a list
or use .(...)
and then assign (:=
) it to new columns
carsDT[speed < 15, paste0("col", 1:2) := list(1, 2)]
Dynamic column names in data.table
From data.table 1.9.4
, you can just do this:
## A parenthesized symbol, `(cn)`, gets evaluated to "blah" before `:=` is carried out
test_dtb[, (cn) := mean(a), by = id]
head(test_dtb, 4)
# a b id blah
# 1: 41 19 1 54.2
# 2: 4 99 2 50.0
# 3: 49 85 3 46.7
# 4: 61 4 4 57.1
See Details in ?:=
:
DT[i, (colvector) := val]
[...] NOW PREFERRED [...] syntax. The parens are enough to stop the LHS being a symbol; same as
c(colvector)
Original answer:
You were on exactly the right track: constructing an expression to be evaluated within the call to [.data.table
is the data.table way to do this sort of thing. Going just a bit further, why not construct an expression that evaluates to the entire j
argument (rather than just its left hand side)?
Something like this should do the trick:
## Your code so far
library(data.table)
test_dtb <- data.table(a=sample(1:100, 100),b=sample(1:100, 100),id=rep(1:10,10))
cn <- "blah"
## One solution
expr <- parse(text = paste0(cn, ":=mean(a)"))
test_dtb[,eval(expr), by=id]
## Checking the result
head(test_dtb, 4)
# a b id blah
# 1: 30 26 1 38.4
# 2: 83 82 2 47.4
# 3: 47 66 3 39.5
# 4: 87 23 4 65.2
dynamic column names seem to work when := is used but not when = is used in data.table
One option is to use the base R function setNames
aggregate_mtcars <- mtcars_copy[, setNames(.(sum(carb)), new_col)]
Or you could use data.table::setnames
aggregate_mtcars <- setnames(mtcars_copy[, .(sum(carb))], new_col)
Adding Column and column names dynamically
Not sure if I understand the question. But i think you are looking for a map function from purr with dynamic columns names. If the logic is wrong you can just adjust inside the function.
library(tidyverse)
library(data.table)
map_dfc(df$row, function(x){
nm <- paste("is_present_", x, sep = "")
df %>%
mutate(!!nm := ifelse(id == x, 1, 0))}) %>%
select(contains("is_present_"))
results in:
is_present_1 is_present_2 is_present_3 is_present_4 is_present_5 is_present_6
1 1 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 1
5 1 0 0 0 0 0
6 0 0 0 0 0 0
Sample data:
df <- fread("
id
1: 1
2: 29
3: 26
4: 6
5: 1
6: 14") %>%
select(2) %>%
rownames_to_column("row")
Assigning/Referencing a column name in data.table dynamically (in i, j and by)
Edit:
Based on your clarifications, here is an approach with setNames
and get
. The trick here is that ..
instructs the evaluation to occur in the calling environment.
library(data.table)
cars <- data.table(cars)
strFactor <- "dist"
strNewVariable <- "Totals x Factor: "
strBy <- "speed"
cars[ get(strFactor) > 50,
setNames(.(.N * get(..strFactor)),strNewVariable),
by=strBy]
R aggregate dynamically added columns with a separate function for each of them
We can use across
to apply functions on blocks of columns
library(dplyr)
df1 %>%
group_by(id, v) %>%
summarise(across(c(t1, t3), mean),
across(c(t2, t4, date1), max),
list1 = toString(list1), .groups = 'drop')
-output
# A tibble: 1 x 8
# id v t1 t3 t2 t4 date1 list1
# <int> <dbl> <dbl> <dbl> <int> <dbl> <chr> <chr>
#1 1 1 1.5 0.5 3 3.7 2020-09-05 val1, val2
If the functions, column names are all user input
nm1 <- c("t1", "t3")
nm2 <- c("t2", "t4", "date1")
nm3 <- c("list1")
f1 <- "mean"
f2 <- "max"
f3 <- "toString"
df1 %>%
group_by(id, v) %>%
summarise(across(all_of(nm1), ~ match.fun(f1)(.)),
across(all_of(nm2), ~ match.fun(f2)(.)),
!! nm3 := match.fun(f3)(!! rlang::sym(nm3)), .groups = 'drop')
-output
# A tibble: 1 x 8
# id v t1 t3 t2 t4 date1 list1
# <int> <dbl> <dbl> <dbl> <int> <dbl> <date> <chr>
#1 1 1 1.5 0.5 3 3.7 2020-09-05 val1, val2
It can be also passed as an expression and evaluated
expr1 <- glue::glue('across(c({toString(nm1)}), {f1});',
'across(c({toString(nm2)}), {f2});',
'across(c({toString(nm3)}), {f3})')
df1 %>%
group_by(id, v) %>%
summarise(!!! rlang::parse_exprs(expr1), .groups = 'drop')
-output
# A tibble: 1 x 8
# id v t1 t3 t2 t4 date1 list1
# <int> <dbl> <dbl> <dbl> <int> <dbl> <date> <chr>
#1 1 1 1.5 0.5 3 3.7 2020-09-05 val1, val2
data
df1 <- structure(list(id = c(1L, 1L), v = c(1, 1), t1 = c(1.4, 1.6),
t2 = 2:3, t3 = c(0.45, 0.55), t4 = c(3, 3.7), date1 = structure(c(18508,
18510), class = "Date"), list1 = c("val1", "val2")), row.names = c(NA,
-2L), class = "data.frame")
r aggregate dynamic columns
aggregate
can take a formula, and you can build a formula from a string.
form = as.formula(paste(". ~", paste(group_by, collapse = " + ")))
aggregate(form, data = smalldat, FUN = mean)
# group1 group2 x y
# 1 1 a 0.1021667 -0.09798418
# 2 2 a -0.5695960 -0.67409059
# 3 1 b -1.0341342 -0.46696381
# 4 2 b -0.3102046 0.46478476
Related Topics
Split One Row into Multiple Rows
Plotting Pca Biplot with Ggplot2
How to Get Unsaved Script Tabs
How to Determine the Namespace of a Function
Perform Multiple Paired T-Tests Based on Groups/Categories
Count Observations Greater Than a Particular Value
Adding Regression Line Per Group with Ggplot2
R: Ggplot2 Barplot and Error Bar
What Is Difference Between Dataframe and List in R
Reading Text File with Multiple Space as Delimiter in R
Change Values in Multiple Columns of a Dataframe Using a Lookup Table
How to Test If List Element Exists
How to Index an Element of a List Object in R
How to Redirect Console Output to a Variable
Any Suggestions for How to Plot Mixem Type Data Using Ggplot2
How to Spread or Cast Multiple Values in R
Add Text to Horizontal Barplot in R, Y-Axis at Different Scale