How to write a function that calls a function that calls data.table?
This will work:
plotfoo <- function(data, by) {
by <- substitute(by)
do.call(foo, list(quote(data), by))
}
plotfoo(DT, gear)
# by N
# 1: 4 12
# 2: 3 15
# 3: 5 5
Explanation:
The problem is that your call to foo()
in plotfoo()
looks like one of the following:
foo(data, eval(by))
foo(data, by)
When foo
processes those calls, it dutifully substitute
s for the second formal argument (by
) getting as by
's value the symbols eval(by)
or by
. But you want by
's value to be gear
, as in the call foo(data, gear)
.
do.call()
solves this problem by evaluating the elements of its second argument before constructing the call that it then evaluates. As a result, when you pass it by
, it evaluates it to its value (the symbol gear
) before constructing a call that looks (essentially) like this:
foo(data, gear)
How to use data.table inside a function call?
A simple fix would be to pass the column name as string
fillna = function(df,var){
col = df[[var]]
set(df, i = which(is.na(col)), j = var, value = mean(col, na.rm=T))
return(df)
}
fillna(DT,"a")
# a b
#1: 6.00 4
#2: 3.00 5
#3: 1.00 6
#4: 9.00 7
#5: 4.75 8
Calling user defined function from data.table object
Two solutions to solving the problem (thanks @chinsoon12) :
test[,c:=mapply(f, test[,a],test[,b])]
test[,c:=f(a,b),1L:nrow(test)]
Speed-wise, these two solutions are equivalent :
a<-1:500
b<-500:1
test_1 <- data.table(a,b)
test_2 <- data.table(a,b)
bench <- microbenchmark(v_1 = test_1[,c:=mapply(f,test_1[,a],test_1[,b])],v_2 = test_2[,c:=f(a,b),1L:nrow(test_2)],times=100L)
summary(bench)
# expr min lq mean median uq max neval cld
#1 v_1 91.83598 95.63639 97.82780 96.94672 98.51073 113.2232 100 a
#2 v_2 91.72392 95.45878 98.92037 96.53573 98.71301 139.9906 100 a
autoplot(bench)
Benchmark plot
Apply function to data.table using function's character name and arguments as character vector
Yes you are missing something (well, it's not really obvious, but careful debugging of the error identifies the problem). Your function expects named arguments arg1
and arg2
. You are passing it arguments y = ...
and z = ...
via do.call
(which you have noticed). The solution is to pass the list without names:
> DT[, do.call(func, unname(.SD[, mycols, with = F])), by = x]
x V1
1: a 6
2: a 6
3: a 11
4: a 17
5: a 7
6: b 15
7: b 17
8: b 10
9: b 11
10: b 10
Writings functions (procedures) for data.table objects
Yes, the addition, modification, deletion of columns in data.table
s is done by reference
. In a sense, it is a good thing because a data.table
usually holds a lot of data, and it would be very memory and time consuming to reassign it all every time a change to it is made. On the other hand, it is a bad thing because it goes against the no-side-effect
functional programming approach that R tries to promote by using pass-by-value
by default. With no-side-effect programming, there is little to worry about when you call a function: you can rest assured that your inputs or your environment won't be affected, and you can just focus on the function's output. It's simple, hence comfortable.
Of course it is ok to disregard John Chambers's advice if you know what you are doing. About writing "good" data.tables procedures, here are a couple rules I would consider if I were you, as a way to limit complexity and the number of side-effects:
- a function should not modify more than one table, i.e., modifying that table should be the only side-effect,
- if a function modifies a table, then make that table the output of the function. Of course, you won't want to re-assign it: just run
do.something.to(table)
and nottable <- do.something.to(table)
. If instead the function had another ("real") output, then when callingresult <- do.something.to(table)
, it is easy to imagine how you may focus your attention on the output and forget that calling the function had a side effect on your table.
While "one output / no-side-effect" functions are the norm in R, the above rules allow for "one output or side-effect". If you agree that a side-effect is somehow a form of output, then you'll agree I am not bending the rules too much by loosely sticking to R's one-output functional programming style. Allowing functions to have multiple side-effects would be a little more of a stretch; not that you can't do it, but I would try to avoid it if possible.
r data.table usage in function call
One possibility is to define your own re-leveling function using data.table::setattr
that will modify dt
in place. Something like
DTsetlvls <- function(x, newl)
setattr(x, "levels", c(setdiff(levels(x), newl), rep("other", length(newl))))
Then use it within another predefined function
f <- function(variableName, min.freq){
fail.min.f <- dt[, .N, by = variableName][N < min.freq, get(variableName)]
dt[, DTsetlvls(get(variableName), fail.min.f)]
invisible()
}
f("type", min.freq)
levels(dt$type)
# [1] "C" "other"
Some other data.table
alternatives
f <- function(var, min.freq) {
fail.min.f <- dt[, .N, by = var][N < min.freq, get(var)]
dt[get(var) %in% fail.min.f, (var) := "Other"]
dt[, (var) := factor(get(var))]
}
Or using set
/.I
f <- function(var, min.freq) {
fail.min.f <- dt[, .I[.N < min.freq], by = var]$V1
set(dt, fail.min.f, var, "other")
set(dt, NULL, var, factor(dt[[var]]))
}
Or combining with base R (doesn't modify original data set)
f <- function(df, variableName, min.freq){
fail.min.f <- df[, .N, by = variableName][N < min.freq, get(variableName)]
levels(df$type)[fail.min.f] <- "Other"
df
}
Alternatively, we could stick we character
s instead (if type
is a character
), you could simply do
f <- function(var, min.freq) dt[, (var) := if(.N < min.freq) "other", by = var]
Is it possible to call a function inside a data.table operation?
You seem to be a bit confused about what you're doing. In data.table
, the second argument is an expression (unlike ddply
's 3rd argument, which is a function) - and right now you just gave it an anonymous function.
No reproducible data in OP to test, but my guess is you simply want:
dt[, {
m1 <- nls(form, data=.SD, start=s)
y.pred <- predict(m1, newdata=data.frame(x=x.range))
list(x=x.range, y=y.pred)
},
by=list(ID1,ID2,ID3)]
How to run a function inside data.table?
We need to specify the pattern
argument if we are not using anonymous function call
my[,lapply(.SD, grepl, pattern = patt)]
Or otherwise with an anonymous function call
my[,lapply(.SD, function(x) grepl(patt, x))]
Use data.table within another function in R
If you want to use non-standard evaluation, you need something like substitute
. However, there is absolutely no reason for using parse
.
addColumnsError <- function(dt, v1, v2){
eval(substitute(dt[, v1 + v2]))
}
addColumnsError(dt, var1, var2)
#[1] 3 6 9 12 15 18 21 24 27 30
Related Topics
How to Highlight Time Ranges on a Plot
Plotting a Curve Around a Set of Points
Rolling Window Over Irregular Time Series
How to Change the Na Color from Gray to White in a Ggplot Choropleth Map
How to Suppress Row Names When Using Dt::Renderdatatable in R Shiny
Identify Points Within Specified Distance in R
How to Fix Outofmemoryerror (Java): Gc Overhead Limit Exceeded in R
Error When I Try to Predict Class Probabilities in R - Caret
Multiple Lines Each Based on a Different Dataframe in Ggplot2 - Automatic Coloring and Legend
Create Category Based on Range in R
Anti-Aliasing in R Graphics Under Windows (As Per MAC)
R: Remove Multiple Empty Columns of Character Variables
Dynamically Converting a List of Excel Files to CSV Files in R
Initialize an Empty Tibble with Column Names and 0 Rows
Wrap Long Text in Kable Table Column
Create Lagged Variable in Unbalanced Panel Data in R
Ggplot2 Equivalent of Matplot():Plot a Matrix/Array by Columns