In R Data.Table, How to Pass Variable Parameters to an Expression

In R data.table, how do I pass variable parameters to an expression?

An alternative to flodel's answer in the comments could be

e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))

b <- parse(text = v2)

rDT2 <- dt[, eval(e), by = eval(b)]

# b V1
# [1,] setosa 250.3
# [2,] versicolor 296.8
# [3,] virginica 329.4

EDIT:

And to put this into a function,

getResult <- function(dt, expr, gby){
return(dt[, eval(expr), by = eval(gby)])
}

(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above



EDIT from Matthew:
There's a subtle reason why the paste0 and eval \ quote methods can be faster than get in some cases, too. One of the reasons grouping can be fast is that data.table inspects j to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j) to do that. When using get() in j the column being used is hidden from all.vars and data.table falls back to subsetting all the columns just in case the j expression needs them (much like when the .SD symbol is used in j, for which .SDcols was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT is say 1e7x100 then a grouped j=sum(V1) should be much faster than a grouped j=sum(get("V1")) for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0 and parse might come into it. All depends really. Setting verbose=TRUE should print out a message about which columns have been detected as used by j, so that can be checked.

Pass variables in function to data.table for lm()

Using quote and substitute from Pass variable name as argument inside data.table with tweaks to your lm formula and .SDcols:

fun1 <- function(dt, y, by_col) {
expr <- quote(dt[,
.(lm_results=lapply(.SD, function(x) summary(lm(Y ~ x)))),
.SDcols=sdcols,
by=byexpr])
eval(do.call(substitute, list(expr,
list(sdcols=substitute(!y), Y=as.name(y), byexpr=substitute(by_col)))))
}

fun1(data1, "colA", colD)

The uncool thing is that colA needs to be passed in as a string.

output:

      colD   lm_results
1: apples <summary.lm>
2: apples <summary.lm>
3: bananas <summary.lm>
4: bananas <summary.lm>

Passing multiple arguments to data.table inside a function

Or using eval with substitute:

library(data.table) #Win R-3.5.1 x64 data.table_1.12.2
dt_mtcars <- as.data.table(mtcars)

processFUN <- function(dt, where, select, group) {

out <- dt[i=eval(substitute(where)),
j=eval(substitute(select)),
by=eval(substitute(group))]

return(out)
}

processFUN(dt_mtcars, mpg>20, .(mean_mpg=mean(mpg), median_mpg=median(mpg)), .(cyl, gear))

Some of the earliest references that I can find are

  1. Aggregating sub totals and grand totals with data.table
  2. Using data.table i and j arguments in functions

The old faq 1.6 contains reference to this:
http://datatable.r-forge.r-project.org/datatable-faq.pdf

Pass argument to data.table aggregation function

You can use get:

wtmean1 <- function(dt1, weight) {
dt1[,weighted.mean(x, get(weight)), by=timeperiod]
}

With your sample data:

> set.seed(1)
> mydata <- data.table(x=1:10, timeperiod=rep(1:2,5), wt1=rnorm(10), wt2=rnorm(10))
> wtmean1(mydata, "wt1")
timeperiod V1
1: 1 -102.476925
2: 2 3.362326

Using data.table i and j arguments in functions

Gavin and Josh are right. This answer is only to add more background. The idea is that not only can you pass variable column names into a function like that, but expressions of column names, using quote().

group = quote(car)
mtcars[, list(Total=length(mpg)), by=group][order(group)]
group Total
AMC 1
Cadillac 1
...
Toyota 2
Valiant 1
Volvo 1

Although, admitedly more difficult to start with, it can be more flexible. That's the idea, anyway. Inside functions you need substitute(), like this :

tableOrder = function(x,.expr) {
.expr = substitute(.expr)
ans = x[,list(Total=length(mpg)),by=.expr]
setkeyv(ans, head(names(ans),-1)) # see below re feature request #1780
ans
}

tableOrder(mtcars, car)
.expr Total
AMC 1
Cadillac 1
Camaro 1
...
Toyota 2
Valiant 1
Volvo 1

tableOrder(mtcars, substring(car,1,1)) # an expression, not just a column name
.expr Total
[1,] A 1
[2,] C 3
[3,] D 3
...
[8,] P 2
[9,] T 2
[10,] V 2

tableOrder(mtcars, list(cyl,gear%%2)) # by two expressions, so head(,-1) above
cyl gear Total
[1,] 4 0 8
[2,] 4 1 3
[3,] 6 0 4
[4,] 6 1 3
[5,] 8 1 14

A new argument keyby was added in v1.8.0 (July 2012) making it simpler :

tableOrder = function(x,.expr) {
.expr = substitute(.expr)
x[,list(Total=length(mpg)),keyby=.expr]
}

Comments and feedback in the area of i,j and by variable expressions are most welcome. The other thing you can do is have a table where a column contains expressions and then look up which expression to put in i, j or by from that table.

Passing unquoted function arguments to i in data.table

There may be other (better) options, but you can wrap it in tryCatch and use bquote for the unquoted argument

test.function <- function(my.dt, ...){
where <- tryCatch(parse(text = paste0(list(...))), error = function (e) parse(text = paste0(list(bquote(...)))))
my.dt <- my.dt[eval(where), ]
return(my.dt)
}

tmp <- test.function(x, 'x==3 | sex=="F"')
head(tmp)
x sex
1: 1 F
2: 2 F
3: 3 F
4: 3 M
5: 3 F

tmp <- test.function(x, x==3 | sex=='F')
head(tmp)
x sex
1: 1 F
2: 2 F
3: 3 F
4: 3 M
5: 3 F

R custom data.table function with multiple variable inputs

Here's an option using mget, as commented:

fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {

temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list),
by = by_var_list, .SDcols = var_list]

setorderv(temp, order_var_list)

cols1 <- paste0(var_name_list, "_del")
cols2 <- paste0(cols1, "_rel")

temp[, (cols1) := lapply(mget(var_name_list), function(x) {
x - shift(x, n = 1, type = "lag")
})]

temp[, (cols2) := lapply(mget(var_name_list), function(x) {
xshift <- shift(x, n = 1, type = "lag")
(x - xshift) / xshift
})]

temp[]
}

fn_agg(dt,
var_list = c("x", "y"),
var_name_list = c("x_sum", "y_sum"),
by_var_list = c("a", "b"),
order_var_list = c("a", "b"))

# a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e 254 358 NA NA NA NA
#2: b f 246 116 -8 -242 -0.031496063 -0.6759777
#3: c g 272 242 26 126 0.105691057 1.0862069
#4: d h 273 194 1 -48 0.003676471 -0.1983471

Instead of mget, you could also make use of data.table's .SDcols argument as in

temp[, (cols1) := lapply(.SD, function(x) {
x - shift(x, n = 1, type = "lag")
}), .SDcols = var_name_list]

Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag") but I only wanted to demonstrate a way to use data.table in functions.

Passing data-variables to R formulas

Wrap the formula in "expr," then evaluate it.

library(dplyr)
lm_tidy <- function(df, x, y) {
x <- sym(x)
y <- sym(y)
fm <- expr(!!y ~ !!x)
lm(fm, data = df)
}

This function is equivalent:

lm_tidy <- function(df, x, y) {
fm <- expr(!!sym(y) ~ !!sym(x))
lm(fm, data = df)
}

Then

lm_tidy(mtcars, "cyl", "mpg")

gives

Call:
lm(formula = fm, data = .)

Coefficients:
(Intercept) cyl
37.885 -2.876

EDIT per comment below:

library(rlang)
lm_tidy_quo <- function(df, x, y){
y <- enquo(y)
x <- enquo(x)
fm <- paste(quo_text(y), "~", quo_text(x))
lm(fm, data = df)
}

You can then pass symbols as arguments

lm_tidy_quo(mtcars, cyl, mpg)


Related Topics



Leave a reply



Submit