Lapply-Ing with the "$" Function

lapply-ing with the $ function

This is documented in ?lapply, in the "Note" section (emphasis mine):

For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g. bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[0L]],
...)
, with 0L replaced by the current integer index. This is not
normally a problem, but it can be if FUN uses sys.call or
match.call or if it is a primitive function that makes use of the
call.
This means that it is often safer to call primitive functions
with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x))
is required in R 2.7.1 to ensure that method dispatch for is.numeric
occurs correctly.

Modify the code (lapply function) in R

[ 1]

modelList <- lapply(mtcars[-c(4,9)], function(x) aov(x ~ hp*am, data=mtcars) )

[2]

df2 <- plyr::ldply(modelList, function(x) summary(x)[[1]][["Pr(>F)"]])
names(df2) <- c(attr(modelList[[1]]$terms, "term.labels"), "residuals")

[3]

res.list <- lapply(modelList, '[[', "residuals")

par(mfrow=c(5,2), oma=c(0,0,0,0))
lapply(res.list, hist)

Sample Image

Multiplying elements of list with lapply is almost twice as fast with in-line function definition than with standard *

lapply calls match.fun, which must spend some time (well, about a microsecond) matching the string "*" to the primitive function `*`. Passing the function directly avoids the overhead.

l <- list(1, 2, 3)
microbenchmark::microbenchmark(lapply(l, function(x) x * 1000),
lapply(l, "*", 1000),
lapply(l, `*`, 1000),
times = 1e+06L)
## Unit: nanoseconds
## expr min lq mean median uq max neval
## lapply(l, function(x) x * 1000) 1271 1435 1614.497 1476 1517 1243981 1e+06
## lapply(l, "*", 1000) 1640 1763 2026.791 1804 1886 16498605 1e+06
## lapply(l, `*`, 1000) 861 984 1198.956 1025 1066 16636365 1e+06
microbenchmark::microbenchmark(match.fun(function(x) x * 1000),
match.fun("*"),
match.fun(`*`),
times = 1e+06L)
## Unit: nanoseconds
## expr min lq mean median uq max neval
## match.fun(function(x) x * 1000) 82 164 249.0617 205 205 15783606 1e+06
## match.fun("*") 779 902 1036.1593 902 984 15515261 1e+06
## match.fun(`*`) 41 164 187.4243 164 164 588842 1e+06

That said, match.fun is never going to be a bottleneck, unless maybe you've written a function that calls match.fun a few billion times, so optimizing at this level would just be "for fun".

Lapplying a function over two lists of dataframes in R

Here, we could use Map from base R to apply the function on the corresponding elements of both the lists

out <- Map(my_function, list_A, list_B)

lapply can also be used, if we loop over the sequence of one of the list

out <- lapply(seq_along(list_A), function(i) 
my_function(list_A[[i]], list_B[[i]]))

which is similar to using a for loop

out <- vector('list', length(list_A))
for(i in seq_along(list_A)) out[[i]] <- my_function(list_A[[i]], list_B[[i]])

Using lapply with if to test each element in a list

It pains me to answer this because it's very un R to do this. You could try being more explicit and use brackets as in:

lapply(alist, function(x) if (x > 7) {1} else {0})

Or the vectorized ifelse

lapply(alist, function(x) ifelse(x > 7, 1, 0))

Or best of all:

as.numeric(alist > 7)


Related Topics



Leave a reply



Submit