Dynamic Column Names in Data.Table

Dynamic column names in data.table

From data.table 1.9.4, you can just do this:

## A parenthesized symbol, `(cn)`, gets evaluated to "blah" before `:=` is carried out
test_dtb[, (cn) := mean(a), by = id]
head(test_dtb, 4)
# a b id blah
# 1: 41 19 1 54.2
# 2: 4 99 2 50.0
# 3: 49 85 3 46.7
# 4: 61 4 4 57.1

See Details in ?:=:

DT[i, (colvector) := val]

[...] NOW PREFERRED [...] syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector)


Original answer:

You were on exactly the right track: constructing an expression to be evaluated within the call to [.data.table is the data.table way to do this sort of thing. Going just a bit further, why not construct an expression that evaluates to the entire j argument (rather than just its left hand side)?

Something like this should do the trick:

## Your code so far
library(data.table)
test_dtb <- data.table(a=sample(1:100, 100),b=sample(1:100, 100),id=rep(1:10,10))
cn <- "blah"

## One solution
expr <- parse(text = paste0(cn, ":=mean(a)"))
test_dtb[,eval(expr), by=id]

## Checking the result
head(test_dtb, 4)
# a b id blah
# 1: 30 26 1 38.4
# 2: 83 82 2 47.4
# 3: 47 66 3 39.5
# 4: 87 23 4 65.2

How to assign dynamic column names in data.table under `:=`?

We can place the values in a list or use .(...) and then assign (:=) it to new columns

carsDT[speed < 15, paste0("col", 1:2) := list(1, 2)]

R data.table dynamic column name of group by returning new table

We can use setNames

library(data.table)
dt[, setNames(list(mean(a)), column_name), by = id]

# id mean
# 1: 1 56.8
# 2: 2 50.5
# 3: 3 50.5
# 4: 4 42.4
# 5: 5 49.9
# 6: 6 47.8
# 7: 7 60.6
# 8: 8 57.4
# 9: 9 54.6
#10: 10 34.5

data

set.seed(123)
dt <- data.table(a = sample(1:100, 100), b = sample(1:100, 100), id = rep(1:10,10))
column_name <- "mean"

Create an data.frame in R with dynamically assigned column names

Does this help?

goalsMenu <- paste("Name", 1:40, sep="")
output <- as.data.frame(matrix(rep(0, 5 + length(goalsMenu)), nrow=1))
names(output) <- c("analysis", "patient", "date", goalsMenu, "CR1", "CR2")

Basically, I create a data.frame output with the number of columns first and name those columns in the next step. However, be aware about mdsumner's comment! This way, all columns are of class numeric. You can deal with that later though: change the class of columns in data.frame

Pass column name in data.table using variable

Use the quote() and eval() functions to pass a variable to j. You don't need double-quotes on the column names when you do it this way, because the quote()-ed string will be evaluated inside the DT[]

temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"

With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form

temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933

dynamic column names seem to work when := is used but not when = is used in data.table

One option is to use the base R function setNames

aggregate_mtcars <- mtcars_copy[, setNames(.(sum(carb)), new_col)]

Or you could use data.table::setnames

aggregate_mtcars <- setnames(mtcars_copy[, .(sum(carb))], new_col)

Dynamically add column names to data.table when aggregating

As mentioned in the comments by lukeA, setNames can be used:

m <- c("blah", "foo")
test_dtb[ , setNames(list(mean(b), median(b)), m), by = id]


Related Topics



Leave a reply



Submit