In R data.table, how do I pass variable parameters to an expression?
An alternative to flodel's answer in the comments could be
e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))
b <- parse(text = v2)
rDT2 <- dt[, eval(e), by = eval(b)]
# b V1
# [1,] setosa 250.3
# [2,] versicolor 296.8
# [3,] virginica 329.4
EDIT:
And to put this into a function,
getResult <- function(dt, expr, gby){
return(dt[, eval(expr), by = eval(gby)])
}
(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above
EDIT from Matthew:
There's a subtle reason why the paste0
and eval
\ quote
methods can be faster than get
in some cases, too. One of the reasons grouping can be fast is that data.table
inspects j
to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j)
to do that. When using get()
in j
the column being used is hidden from all.vars
and data.table
falls back to subsetting all the columns just in case the j
expression needs them (much like when the .SD
symbol is used in j
, for which .SDcols
was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT
is say 1e7x100 then a grouped j=sum(V1)
should be much faster than a grouped j=sum(get("V1"))
for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0
and parse
might come into it. All depends really. Setting verbose=TRUE
should print out a message about which columns have been detected as used by j
, so that can be checked.
Pass variables in function to data.table for lm()
Using quote
and substitute
from Pass variable name as argument inside data.table with tweaks to your lm
formula and .SDcols
:
fun1 <- function(dt, y, by_col) {
expr <- quote(dt[,
.(lm_results=lapply(.SD, function(x) summary(lm(Y ~ x)))),
.SDcols=sdcols,
by=byexpr])
eval(do.call(substitute, list(expr,
list(sdcols=substitute(!y), Y=as.name(y), byexpr=substitute(by_col)))))
}
fun1(data1, "colA", colD)
The uncool thing is that colA
needs to be passed in as a string.
output:
colD lm_results
1: apples <summary.lm>
2: apples <summary.lm>
3: bananas <summary.lm>
4: bananas <summary.lm>
Passing multiple arguments to data.table inside a function
Or using eval
with substitute
:
library(data.table) #Win R-3.5.1 x64 data.table_1.12.2
dt_mtcars <- as.data.table(mtcars)
processFUN <- function(dt, where, select, group) {
out <- dt[i=eval(substitute(where)),
j=eval(substitute(select)),
by=eval(substitute(group))]
return(out)
}
processFUN(dt_mtcars, mpg>20, .(mean_mpg=mean(mpg), median_mpg=median(mpg)), .(cyl, gear))
Some of the earliest references that I can find are
- Aggregating sub totals and grand totals with data.table
- Using data.table i and j arguments in functions
The old faq 1.6 contains reference to this:
http://datatable.r-forge.r-project.org/datatable-faq.pdf
Pass argument to data.table aggregation function
You can use get
:
wtmean1 <- function(dt1, weight) {
dt1[,weighted.mean(x, get(weight)), by=timeperiod]
}
With your sample data:
> set.seed(1)
> mydata <- data.table(x=1:10, timeperiod=rep(1:2,5), wt1=rnorm(10), wt2=rnorm(10))
> wtmean1(mydata, "wt1")
timeperiod V1
1: 1 -102.476925
2: 2 3.362326
Using data.table i and j arguments in functions
Gavin and Josh are right. This answer is only to add more background. The idea is that not only can you pass variable column names into a function like that, but expressions of column names, using quote()
.
group = quote(car)
mtcars[, list(Total=length(mpg)), by=group][order(group)]
group Total
AMC 1
Cadillac 1
...
Toyota 2
Valiant 1
Volvo 1
Although, admitedly more difficult to start with, it can be more flexible. That's the idea, anyway. Inside functions you need substitute()
, like this :
tableOrder = function(x,.expr) {
.expr = substitute(.expr)
ans = x[,list(Total=length(mpg)),by=.expr]
setkeyv(ans, head(names(ans),-1)) # see below re feature request #1780
ans
}
tableOrder(mtcars, car)
.expr Total
AMC 1
Cadillac 1
Camaro 1
...
Toyota 2
Valiant 1
Volvo 1
tableOrder(mtcars, substring(car,1,1)) # an expression, not just a column name
.expr Total
[1,] A 1
[2,] C 3
[3,] D 3
...
[8,] P 2
[9,] T 2
[10,] V 2
tableOrder(mtcars, list(cyl,gear%%2)) # by two expressions, so head(,-1) above
cyl gear Total
[1,] 4 0 8
[2,] 4 1 3
[3,] 6 0 4
[4,] 6 1 3
[5,] 8 1 14
A new argument keyby
was added in v1.8.0 (July 2012) making it simpler :
tableOrder = function(x,.expr) {
.expr = substitute(.expr)
x[,list(Total=length(mpg)),keyby=.expr]
}
Comments and feedback in the area of i
,j
and by
variable expressions are most welcome. The other thing you can do is have a table where a column contains expressions and then look up which expression to put in i
, j
or by
from that table.
Passing unquoted function arguments to i in data.table
There may be other (better) options, but you can wrap it in tryCatch
and use bquote
for the unquoted argument
test.function <- function(my.dt, ...){
where <- tryCatch(parse(text = paste0(list(...))), error = function (e) parse(text = paste0(list(bquote(...)))))
my.dt <- my.dt[eval(where), ]
return(my.dt)
}
tmp <- test.function(x, 'x==3 | sex=="F"')
head(tmp)
x sex
1: 1 F
2: 2 F
3: 3 F
4: 3 M
5: 3 F
tmp <- test.function(x, x==3 | sex=='F')
head(tmp)
x sex
1: 1 F
2: 2 F
3: 3 F
4: 3 M
5: 3 F
R custom data.table function with multiple variable inputs
Here's an option using mget
, as commented:
fn_agg <- function(DT, var_list, var_name_list, by_var_list, order_var_list) {
temp <- DT[, setNames(lapply(.SD, sum, na.rm = TRUE), var_name_list),
by = by_var_list, .SDcols = var_list]
setorderv(temp, order_var_list)
cols1 <- paste0(var_name_list, "_del")
cols2 <- paste0(cols1, "_rel")
temp[, (cols1) := lapply(mget(var_name_list), function(x) {
x - shift(x, n = 1, type = "lag")
})]
temp[, (cols2) := lapply(mget(var_name_list), function(x) {
xshift <- shift(x, n = 1, type = "lag")
(x - xshift) / xshift
})]
temp[]
}
fn_agg(dt,
var_list = c("x", "y"),
var_name_list = c("x_sum", "y_sum"),
by_var_list = c("a", "b"),
order_var_list = c("a", "b"))
# a b x_sum y_sum x_sum_del y_sum_del x_sum_del_rel y_sum_del_rel
#1: a e 254 358 NA NA NA NA
#2: b f 246 116 -8 -242 -0.031496063 -0.6759777
#3: c g 272 242 26 126 0.105691057 1.0862069
#4: d h 273 194 1 -48 0.003676471 -0.1983471
Instead of mget
, you could also make use of data.table
's .SDcols
argument as in
temp[, (cols1) := lapply(.SD, function(x) {
x - shift(x, n = 1, type = "lag")
}), .SDcols = var_name_list]
Also, there are probably ways to improve the function by avoiding duplicated computation of shift(x, n = 1, type = "lag")
but I only wanted to demonstrate a way to use data.table in functions.
Passing data-variables to R formulas
Wrap the formula in "expr," then evaluate it.
library(dplyr)
lm_tidy <- function(df, x, y) {
x <- sym(x)
y <- sym(y)
fm <- expr(!!y ~ !!x)
lm(fm, data = df)
}
This function is equivalent:
lm_tidy <- function(df, x, y) {
fm <- expr(!!sym(y) ~ !!sym(x))
lm(fm, data = df)
}
Then
lm_tidy(mtcars, "cyl", "mpg")
gives
Call:
lm(formula = fm, data = .)
Coefficients:
(Intercept) cyl
37.885 -2.876
EDIT per comment below:
library(rlang)
lm_tidy_quo <- function(df, x, y){
y <- enquo(y)
x <- enquo(x)
fm <- paste(quo_text(y), "~", quo_text(x))
lm(fm, data = df)
}
You can then pass symbols as arguments
lm_tidy_quo(mtcars, cyl, mpg)
Related Topics
Different Legend-Keys Inside Same Legend in Ggplot2
What Is About the First Column in R's Dataset Mtcars
Rle-Like Function That Catches "Run" of Adjacent Integers
Find Rows in a Data Frame Where Two Columns Are Equal
List for Multiple Plots from Loop (Ggplot2) - List Elements Being Overwritten
Reading Multiple Files into Multiple Data Frames
Find the Most Frequent Value by Row
Set Ggplot Plots to Have Same X-Axis Width and Same Space Between Dot Plot Rows
How to Export S3 Method So It Is Available in Namespace
Setting Upper and Lower Limits in Rnorm
How to Apply Cross-Hatching to a Polygon Using the Grid Graphical System
Read/Write Data in Libsvm Format
Predict.Lm() in a Loop. Warning: Prediction from a Rank-Deficient Fit May Be Misleading
Ggplot2: Connecting Points in Polar Coordinates with a Straight Line 2