Assigning/Referencing a Column Name in Data.Table Dynamically (In I, J and By)

Assigning/Referencing a column name in data.table dynamically (in i, j and by)

Based on your clarifications, here is an approach with setNames and get. The trick here is that .. instructs the evaluation to occur in the calling environment.

cars <- data.table(cars)
strFactor <- "dist"
strNewVariable <- "Totals x Factor: "
strBy <- "speed"
cars[ get(strFactor) > 50,
setNames(.(.N * get(..strFactor)),strNewVariable),

Dynamic column names in data.table

From data.table 1.9.4, you can just do this:

## A parenthesized symbol, `(cn)`, gets evaluated to "blah" before `:=` is carried out
test_dtb[, (cn) := mean(a), by = id]
head(test_dtb, 4)
# a b id blah
# 1: 41 19 1 54.2
# 2: 4 99 2 50.0
# 3: 49 85 3 46.7
# 4: 61 4 4 57.1

See Details in ?:=:

DT[i, (colvector) := val]

[...] NOW PREFERRED [...] syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector)

Original answer:

You were on exactly the right track: constructing an expression to be evaluated within the call to [.data.table is the data.table way to do this sort of thing. Going just a bit further, why not construct an expression that evaluates to the entire j argument (rather than just its left hand side)?

Something like this should do the trick:

## Your code so far
test_dtb <- data.table(a=sample(1:100, 100),b=sample(1:100, 100),id=rep(1:10,10))
cn <- "blah"

## One solution
expr <- parse(text = paste0(cn, ":=mean(a)"))
test_dtb[,eval(expr), by=id]

## Checking the result
head(test_dtb, 4)
# a b id blah
# 1: 30 26 1 38.4
# 2: 83 82 2 47.4
# 3: 47 66 3 39.5
# 4: 87 23 4 65.2

Dynamically add column names to data.table when aggregating

As mentioned in the comments by lukeA, setNames can be used:

m <- c("blah", "foo")
test_dtb[ , setNames(list(mean(b), median(b)), m), by = id]

Pass column name in data.table using variable

Use the quote() and eval() functions to pass a variable to j. You don't need double-quotes on the column names when you do it this way, because the quote()-ed string will be evaluated inside the DT[]

temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"

With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form

temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933

Referring to data.table columns by names saved in variables

If you are going to be doing complicated operations inside your j expressions, you should probably use eval and quote. One problem with that in current version of data.table is that the environment of eval is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD to eval. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1] in j would).

Interestingly this issue only plagues the j and you'll be fine using eval normally in i (where .SD is not available anyway).

The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:

x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)

x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]

Note how I was ok not adding .SD in one eval(distcol), but won't be if I take it out of the other eval.

Another option is to use get:

diststr = "dist"
valstr = "val"

x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]

dynamic column names seem to work when := is used but not when = is used in data.table

One option is to use the base R function setNames

aggregate_mtcars <- mtcars_copy[, setNames(.(sum(carb)), new_col)]

Or you could use data.table::setnames

aggregate_mtcars <- setnames(mtcars_copy[, .(sum(carb))], new_col)

R data.table struggling with conditional subsetting when column name is predefined elsewhere

I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:

DT[get(column_name) == 1,]
x y
1: 1 0
2: 1 1

If you need to use column_name in J, you can use get(..column_name):

[1] 1 1 0 0

The .. instructs evaluation to occur in the parent environment.

Another approach for using a string in either I or J is with eval(

DT[eval( == 1]
x y
1: 1 0
2: 1 1

[1] 1 1 0 0

Related Topics

Leave a reply
