How to Access Global/Outer Scope Variable from R Apply Function

Global variable and scope - R

Basically, since you are using the "<-" assignment, the function is creating a copy of the 'global' variable for use within the scope of the function.

This can be seen by adding in a second function g() which alters the value of 'global' before it is printed out in f(), but this time using the "<<-" assignment. The first line in f() creates your locally scoped copy of 'global' for f(x), and then you update the global copy of 'global' using g(x).

global <<- list()

f <- function(x) {
global[[x]] <- "blah"
g(x)
global
}

g <- function(x){
global[[x]] <<- "newblah"
}

f(1) #prints 'blah', despite the fact the g(x) has already updated the value

global #prints 'newblah'

If f(x) were still referencing the global copy of 'global' it would print "newblah" which was assigned in g(x). Instead it prints the value which was assigned in f(x) to the locally scoped copy of 'global'.

However, printing 'global' outside any function shows that g(x) did in fact update the value for the global copy of 'global'.

Now, if you move g(x) inside f(x), then f(x) is now the parent of g(x). In this case, "<<-" assigns to the value of 'global' that is within the scope of f(x). So the global copy of 'global' is still empty, but if you print out 'global' in the scope of f() you get the updated value.

global <<- list()

f <- function(x) {
global[[x]] <- "blah"

g <- function(x){
global[[x]] <<- "newblah"
}

g(x)
global
}

f(1) #prints 'newblah'

global #empty

Global and local variables in R

Variables declared inside a function are local to that function. For instance:

foo <- function() {
bar <- 1
}
foo()
bar

gives the following error: Error: object 'bar' not found.

If you want to make bar a global variable, you should do:

foo <- function() {
bar <<- 1
}
foo()
bar

In this case bar is accessible from outside the function.

However, unlike C, C++ or many other languages, brackets do not determine the scope of variables. For instance, in the following code snippet:

if (x > 10) {
y <- 0
}
else {
y <- 1
}

y remains accessible after the if-else statement.

As you well say, you can also create nested environments. You can have a look at these two links for understanding how to use them:

  1. http://stat.ethz.ch/R-manual/R-devel/library/base/html/environment.html
  2. http://stat.ethz.ch/R-manual/R-devel/library/base/html/get.html

Here you have a small example:

test.env <- new.env()

assign('var', 100, envir=test.env)
# or simply
test.env$var <- 100

get('var') # var cannot be found since it is not defined in this environment
get('var', envir=test.env) # now it can be found

R scoping: disallow global variables in function

My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.

To ensure that your function is not using global variables when it shouldn't be, use the codetools package.

library(codetools)

sUm <- 10
f <- function(x, y) {
sum = x + y
return(sUm)
}

checkUsage(f)

This will print the message:

<anonymous> local variable ‘sum’ assigned but may not be used (:1)

To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.

> findGlobals(f)
[1] "{" "+" "=" "return" "sUm"

> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"

That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.

can lapply not modify variables in a higher scope

I discussed this issue in this related question: "Is R’s apply family more than syntactic sugar". You will notice that if you look at the function signature for for and apply, they have one critical difference: a for loop evaluates an expression, while an apply loop evaluates a function.

If you want to alter things outside the scope of an apply function, then you need to use <<- or assign. Or more to the point, use something like a for loop instead. But you really need to be careful when working with things outside of a function because it can result in unexpected behavior.

In my opinion, one of the primary reasons to use an apply function is explicitly because it doesn't alter things outside of it. This is a core concept in functional programming, wherein functions avoid having side effects. This is also a reason why the apply family of functions can be used in parallel processing (and similar functions exist in the various parallel packages such as snow).

Lastly, the right way to run your code example is to also pass in the parameters to your function like so, and assigning back the output:

mat <- matrix(0,nrow=10,ncol=1)
mat <- matrix(lapply(1:10, function(i, mat) { mat[i,] <- rnorm(1,mean=i)}, mat=mat))

It is always best to be explicit about a parameter when possible (hence the mat=mat) rather than inferring it.

Output selected variables to global environment R function

It is not recommended to write to global environment from inside the function. If you want to create multiple objects in the global environment return a named list from the function and use list2env.

mediansFunction <- function(x){
labmedians <- sapply(x[-1], median)
median_of_median <- median(labmedians)
grand_median <- median(as.matrix(x[-1]))
labMscore <- as.vector(round(abs(scores_na(labmedians, "mad")), digits = 2)) #calculate mscore by lab
labMscoreIndex <- which(labMscore > MscoreMax) #get the position in the vector that exceeds Mscoremax
x[-1][labMscoreIndex] <- NA # discharge values above threshold by making NA
dplyr::lst(data = x, labmedians, grand_median, labMscore)
}

result <- mediansFunction(df)
list2env(result, .GlobalEnv)

Now you have variables data, labmedians, grand_median and labMscore in the global environment.

Using global variable in function

Both <<- and assign will work:

myfunction <- function(var1, var2) {
# Modification of global mydata
mydata <<- ...
# Alternatively:
#assign('mydata', ..., globalenv())

# Assign locally as well
mydata <- mydata

# Definition of another variable with the new mydata
var3 <- ...

# Recursive function
mydata = myfunction(var2, var3)
}

That said, it’s almost always a bad idea to want to modify global data from a function, and there’s almost certainly a more elegant solution to this.

Furthermore, note that <<- is actually not the same as assigning to a variable in globalenv(), rather, it assigns to a variable in the parent scope, whatever that may be. For functions defined in the global environment, it’s the global environment. For functions defined elsewhere, it’s not the global environment.

function environment within lapply loop

I believe it is because function foo() is evaluated in the environment in which it is defined. In your example foo() is defined in global environment and therefore i is not in scope. If you define foo() within the anonymous function then i appears to be evaluated correctly.

env.g <- environment()
invisible(lapply(1, FUN = function(i){
message('global env: exists(i) ', exists('i', envir = env.g))
message('lapply env: exists(i) ', exists('i'))
message(' ')
j <- i + 1

foo <- function(j){
message('foo env: exists(j) ', exists('j'))
message('foo env: exists(i) ', exists('i'))
i
}

foo(j)
}
))

#global env: exists(i) FALSE
#lapply env: exists(i) TRUE

#foo env: exists(j) TRUE
#foo env: exists(i) TRUE


Related Topics



Leave a reply



Submit