can lapply not modify variables in a higher scope
I discussed this issue in this related question: "Is R’s apply family more than syntactic sugar". You will notice that if you look at the function signature for for
and apply
, they have one critical difference: a for
loop evaluates an expression, while an apply
loop evaluates a function.
If you want to alter things outside the scope of an apply function, then you need to use <<-
or assign
. Or more to the point, use something like a for
loop instead. But you really need to be careful when working with things outside of a function because it can result in unexpected behavior.
In my opinion, one of the primary reasons to use an apply
function is explicitly because it doesn't alter things outside of it. This is a core concept in functional programming, wherein functions avoid having side effects. This is also a reason why the apply
family of functions can be used in parallel processing (and similar functions exist in the various parallel packages such as snow).
Lastly, the right way to run your code example is to also pass in the parameters to your function like so, and assigning back the output:
mat <- matrix(0,nrow=10,ncol=1)
mat <- matrix(lapply(1:10, function(i, mat) { mat[i,] <- rnorm(1,mean=i)}, mat=mat))
It is always best to be explicit about a parameter when possible (hence the mat=mat
) rather than inferring it.
Difference between using higher scope variables and using variables explicitly passed in a function
If a function uses a variable from scope that might cause a side effect(function modifying the outer variable) and this is considered bad practice because makes function impure.
Global variables considered bad practice and should only be used if variable is constant. If the variable is constant it is okay, because now function can't modify the scope.
Can non-global variables be modified inside a function in R?
It is possible to update a global variable, in a function using get
and assign
function. Below is the code, which does the same :
heatmap.matrix <- matrix(rep(0,40000), nrow=200, ncol=200)
# foo function should just update a single cell of the declared matrix
varName <- "heatmap.matrix"
foo <- function() {
heatmap.matrix.copy <- get(varName)
heatmap.matrix.copy[40,40] <- 100
assign(varName, heatmap.matrix.copy, pos=1)
}
heatmap.matrix[40,40]
#[1] 0
foo()
heatmap.matrix[40,40]
# [1] 100
you should read up a bit on environments concept. The best place to start is http://adv-r.had.co.nz/Environments.html
How to define multiple variables with lapply?
General solution
Try outer
:
c(outer(1:10, 2:4, Vectorize(function(x, y) x*y)))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
If function is Vectorized already
If the function is already vectorized, as it is here, then we can omit Vectorize
:
c(outer(1:10, 2:4, function(x, y) x * y))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
Particular example shown in question
In fact, in this particular case the anonymous function shown is the default so this would work:
c(outer(1:10, 2:4))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
Also in this particular case we could use:
c(1:10 %o% 2:4)
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
If input is list X
If your starting point is list X
shown in the question then:
c(outer(X[[1]], X[[2]], Vectorize(function(x, y) x * y)))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
or
c(do.call("outer", c(unname(X), Vectorize(function(x, y) x*y))))
## [1] 2 4 6 8 10 12 14 16 18 20 3 6 9 12 15 18 21 24 27 30 4 8 12 16 20
## [26] 24 28 32 36 40
where the prior sections apply to shorten it, if applicable.
Is R's apply family more than syntactic sugar?
The apply
functions in R don't provide improved performance over other looping functions (e.g. for
). One exception to this is lapply
which can be a little faster because it does more work in C code than in R (see this question for an example of this).
But in general, the rule is that you should use an apply function for clarity, not for performance.
I would add to this that apply functions have no side effects, which is an important distinction when it comes to functional programming with R. This can be overridden by using assign
or <<-
, but that can be very dangerous. Side effects also make a program harder to understand since a variable's state depends on the history.
Edit:
Just to emphasize this with a trivial example that recursively calculates the Fibonacci sequence; this could be run multiple times to get an accurate measure, but the point is that none of the methods have significantly different performance:
> fibo <- function(n) {
+ if ( n < 2 ) n
+ else fibo(n-1) + fibo(n-2)
+ }
> system.time(for(i in 0:26) fibo(i))
user system elapsed
7.48 0.00 7.52
> system.time(sapply(0:26, fibo))
user system elapsed
7.50 0.00 7.54
> system.time(lapply(0:26, fibo))
user system elapsed
7.48 0.04 7.54
> library(plyr)
> system.time(ldply(0:26, fibo))
user system elapsed
7.52 0.00 7.58
Edit 2:
Regarding the usage of parallel packages for R (e.g. rpvm, rmpi, snow), these do generally provide apply
family functions (even the foreach
package is essentially equivalent, despite the name). Here's a simple example of the sapply
function in snow
:
library(snow)
cl <- makeSOCKcluster(c("localhost","localhost"))
parSapply(cl, 1:20, get("+"), 3)
This example uses a socket cluster, for which no additional software needs to be installed; otherwise you will need something like PVM or MPI (see Tierney's clustering page). snow
has the following apply functions:
parLapply(cl, x, fun, ...)
parSapply(cl, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
parApply(cl, X, MARGIN, FUN, ...)
parRapply(cl, x, fun, ...)
parCapply(cl, x, fun, ...)
It makes sense that apply
functions should be used for parallel execution since they have no side effects. When you change a variable value within a for
loop, it is globally set. On the other hand, all apply
functions can safely be used in parallel because changes are local to the function call (unless you try to use assign
or <<-
, in which case you can introduce side effects). Needless to say, it's critical to be careful about local vs. global variables, especially when dealing with parallel execution.
Edit:
Here's a trivial example to demonstrate the difference between for
and *apply
so far as side effects are concerned:
> df <- 1:10
> # *apply example
> lapply(2:3, function(i) df <- df * i)
> df
[1] 1 2 3 4 5 6 7 8 9 10
> # for loop example
> for(i in 2:3) df <- df * i
> df
[1] 6 12 18 24 30 36 42 48 54 60
Note how the df
in the parent environment is altered by for
but not *apply
.
Related Topics
R Function Prcomp Fails with Na's Values Even Though Na's Are Allowed
Extract Survival Probabilities in Survfit by Groups
Plotting Functions on Top of Datapoints in R
Tm: Read in Data Frame, Keep Text Id'S, Construct Dtm and Join to Other Dataset
Get Stack Trace on Trycatch'Ed Error in R
Efficient Apply or Mapply for Multiple Matrix Arguments by Row
How to Show Corpus Text in R Tm Package
Model Matrix with All Pairwise Interactions Between Columns
Subset Data Based on Partial Match of Column Names
How Does One Turn Contour Lines into Filled Contours
R - Waiting for Page to Load in Rselenium with Phantomjs
R: Merge Based on Multiple Conditions (With Non-Equal Criteria)
Alternate Geom_Text Position with Hjust
Reshape Long Structured Data.Table into a Wide Structure Using Data.Table Functionality
Drawing Simple Mediation Diagram in R
Custom Fill Color in Ggvis (And Other Options)
More Efficient Strategy for Which() or Match()
How to Find the First and Last Occurrences of an Element in a Data.Frame