Using Functions and Environments

Using functions and environments

The simplest solution is to use the environment when referencing the object:

y <- new.env()
y$x <- 1
f <- function(env,z) {
    env$x+z
}
f(y,z=1)

You would need to assign z to your environment as well.

y <- new.env()
with(y, x <- 1)
f <- function(env,z) {
    assign("z", z, envir=env)
    with(env, x+z)
}
f(y,z=1)

One other option would be to attach your environment so that the variables can now be used directly.

y <- new.env()
with(y, x <- 1)
f <- function(env,z) {
    attach(env)
    y <- x + z
    detach(env)
    y
}
f(y,z=1)

This latter solution is powerful because it means you can use any object from any attached environment within your new environment, but it also means that you need to be very careful about what has been assigned globally.

Edit:

This is interesting, and I don't entirely understand the behavior (i.e. why z is not in the scope of the with call). It has something to do with the creation of the environment originally that is causing it to be outside the scope of the function, because this version works:

f <- function(z) {
    y <- new.env()
    with(y, x <- 1)
    with(y, x+z)
}
f(y,z=1)

How to use the R environment and the globalenv() function

Your card deck is stored in a vector deck in your Global Environment.

deal <- function(){
  card <- deck[1,]
  assign("deck", deck[-1,], envir = globalenv())
  card
}

Each function call creates it's own environment, an object assigned inside a function "lives" just inside of it. That's why you don't "see" a vector named card in your Global Environment (unless you created one before, but this vector is uneffected by deal functions card <- deck[1,] statement).

So assign("deck", deck[-1]) (without the envir argument) would be the same as

deal <- function(){
  card <- deck[1,]
  deck <- deck[-1,]
  card
}

but this won't change your deck outside the function. The vector deck inside the function just exists inside the function. To change the deck outside the function, you have to tell R where to change it. So that's why assign("deck", deck[-1,], envir = globalenv()) is used.

So let's start over with your function deal:

card <- deck[1,]

assigns the first element of deck to card. But wait! deck doesn't exists inside the function? So how is this possible? If the object isn't found inside the function, R looks one level up, in your case most likely the Global Environment. So there R finds an object/vector named deck and does the assignment. Now we have an object/vector named card that exists inside the function.

For further understanding, take a look at Chapter 6: Functions in Advanced R.

Distinct enclosing environment, function environment, etc. in R

TLDR:

indeed, you can change the enclosing environment. Hadley was probably talking about packaged functions.
the enclosing and the binding environment. You were correct.
that's the execution environment. It only exists for the time the function runs.

Function environments

You have to distinguish 4 different environments when talking about a function:

the binding environment is the environment where the function is found (i.e. where its name exists). This is where the actual binding of an object to its name is done. find() gives you the binding environment.
the enclosing environment is the environment where the function is originally created. This is not necessarily the same as the binding environment (see examples below). environment() gives you the enclosing environment.
the local environment is the environment within the function. You call that the execution environment.
the parent frame or calling environment is the environment from where the function was called.

Why does this matter

Every environment has a specific function:

the binding environment is the environment where you find the function.
the local environment is the first environment where R looks for objects.
the general rule is: if R doesn't find an object in the local environment, it then looks in the enclosing environment and so on. The last enclosing environment is always emptyenv().
the parent frame is where R looks for the value of the objects passed as
arguments.

You can change the enclosing environment

Indeed, you can change the enclosing environment. It is the enclosing environment of a function from a package you cannot change. In that case you don't change the enclosing environment, you actually create a copy in the new environment:

> ls()
character(0)
> environment(sd)
<environment: namespace:stats>
> environment(sd) <- globalenv()
> environment(sd)
<environment: R_GlobalEnv>
> ls()
[1] "sd"
> find("sd")
[1] ".GlobalEnv"    "package:stats" # two functions sd now
> rm(sd)
> environment(sd)
<environment: namespace:stats>

In this case, the second sd has the global environment as the enclosing and binding environment, but the original sd is still found inside the package environment, and its enclosing environment is still the namespace of that package

The confusion might arise when you do the following:

> f <- sd
> environment(f)
<environment: namespace:stats>
> find("f")
[1] ".GlobalEnv"

What happens here? The enclosing environment is still the namespace ''stats''. That's where the function is created. However, the binding environment is now the global environment. That's where the name "f" is bound to the object.

We can change the enclosing environment to a new environment e. If you check now, the enclosing environment becomes e, but e itself is empty. f is still bound in the global environment.

> e <- new.env()
> e
<environment: 0x000000001852e0a8>
> environment(f) <- e
> find("f")
[1] ".GlobalEnv"
> environment(f)
<environment: 0x000000001852e0a8>
> ls(e)
character(0)

The enclosing environment of e is the global environment. So f still works as if its enclosure was the global environment. The environment e is enclosed in it, so if something isn't found in e, the function looks in the global environment and so on.

But because e is an environment, R calls that a parent environment.

> parent.env(e)
<environment: R_GlobalEnv>
> f(1:3)
[1] 1

Namespaces and package environments

This principle is also the "trick" packages use:

the function is created in the namespace. This is an environment that is enclosed by the namespaces of other imported packages, and eventually the global environment.
the binding for the function is created in the package environment. This is an environment that encloses the global environment and possible other packages.

The reason for this is simple: objects can only be found inside the environment you are in, or in its enclosing environments.

a function must be able to find other functions(objects), so the local environment must be enclosed by possibly the namespaces of other packages it imports, the base package and lastly the global environment.
a function must be findable from within the global environment. Hence the binding (i.e. the name of the function) must be in an environment that is enclosed by the global environment. This is the package environment (NOT the namespace!)

An illustration:

Sample Image

Now suppose you make an environment with the empty environment as a parent. If you use this as an enclosing environment for a function, nothing works any longer. Because now you circumvent all the package environments, so you can't find a single function any more.

> orphan <- new.env(parent = emptyenv())
> environment(f) <- orphan
> f(1:3)
Error in sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),  : 
  could not find function "sqrt"

The parent frame

This is where it gets interesting. The parent frame or calling environment, is the environment where the values passed as arguments are looked up. But that parent frame can be the local environment of another function. In this case R looks first in that local environment of that other function, and then in the enclosing environment of the calling function, and so all the way up to the global environment, the environments of the attached packages until it reaches the empty environment. That's where the "object not found" bug sleeps.

Call function from the global environment with implicit dataframe variables (from the calling env?) inside dplyr::summarise or mutate

Up front, I'm generally against writing functions that defeat functional reproducibility, having spent too much time troubleshooting functions that change behavior based on something not passed to them.

However, try this:

method_1 <- list(
  any_vs_four_gears = function(data = cur_data()) with(data, any(vs == 1 & gear == 4)),
  any_am_high_hp = function(data = cur_data()) with(data, any(am == 1 & hp > 170)),
  all_combined = function(data = cur_data()) with(data, all(any_vs_four_gears, any_am_high_hp))
)

mtcars %>%
  group_by(carb) %>%
  summarise(
    any_vs_four_gears = method_1$any_vs_four_gears()
    any_am_high_hp = method_1$any_am_high_hp(),
    all_combined = method_1$all_combined()
  )
# # A tibble: 6 x 4
#    carb any_vs_four_gears any_am_high_hp all_combined
#   <dbl> <lgl>             <lgl>          <lgl>       
# 1     1 TRUE              FALSE          FALSE       
# 2     2 TRUE              FALSE          FALSE       
# 3     3 FALSE             FALSE          FALSE       
# 4     4 TRUE              TRUE           TRUE        
# 5     6 FALSE             TRUE           FALSE       
# 6     8 FALSE             TRUE           FALSE

This uses the cur_data() pronoun/function found in dplyr-pipe environments, adds just a little surrounding code (with(data, { ... }), so {-expression-friendly), and works "as is".

The errors are not difficult to interpret:

mtcars %>%
  select(-vs) %>%     # intentionally setting up an error
  group_by(carb) %>%
  summarise(
    any_vs_four_gears = method_1$any_vs_four_gears()
    any_am_high_hp = method_1$any_am_high_hp(),
    all_combined = method_1$all_combined()
  )
# Error: Problem with `summarise()` column `any_vs_four_gears`.
# i `any_vs_four_gears = method_1$any_vs_four_gears()`.
# x object 'vs' not found
# i The error occurred in group 1: carb = 1.
# Run `rlang::last_error()` to see where the error occurred.

Return list vs environment from an R function

Although similars, there're differences in return a list and a enviroment.
From Advanced R:

Generally, an environment is similar to a list, with four important exceptions:
Every name in an environment is unique.
The names in an environment are not ordered (i.e., it doesn’t make sense to ask what the first element of an environment is).
An environment has a parent.
Environments have reference semantics.
More technically, an environment is made up of two components, the frame, which contains the name-object bindings (and behaves much like a named list), and the parent environment. Unfortunately “frame” is used inconsistently in R. For example, parent.frame() doesn’t give you the parent frame of an environment. Instead, it gives you the calling environment. This is discussed in more detail in calling environments.

From the help:

help(new.env)

Environments consist of a frame, or collection of named objects, and a pointer to an enclosing environment. The most common example is the frame of variables local to a function call; its enclosure is the environment where the function was defined (unless changed subsequently). The enclosing environment is distinguished from the parent frame: the latter (returned by parent.frame) refers to the environment of the caller of a function. Since confusion is so easy, it is best never to use ‘parent’ in connection with an environment (despite the presence of the function parent.env).

from the function's documentation:

e1 <- new.env(parent = baseenv())  # this one has enclosure package:base.
e2 <- new.env(parent = e1)
assign("a", 3, envir = e1)
ls(e1)
#[1] "a"

However ls will gives the environments created:

ls()
#[1] "e1" "e2"

And you can access your enviroment objects just like a list:

e1$a
#[1] 3

Playing with your functions:

f1 <- function(x) {
   ret <- new.env()
   ret$x <- x
   ret$y <- x^2
   return(ret)
}

res <- f1(2)
res
#<environment: 0x0000021d55a8a3e8>

res$y
#[1] 4

f2 <- function(x) {
   ret <- list()
   ret$x <- x
   ret$y <- x^2
   return(ret)

res2 <- f(2)
res2
#$x
#[1] 2

#$y
#[1] 4

res2$y
#[1] 4

Their performance is quite similar, according to microbenchmarking:

microbenchmark::microbenchmark(
   function(x) {
      ret <- new.env()
      ret$x <- x
      ret$y <- x^2
      return(ret)
   },
   function(x) {
      ret <- list()
      ret$x <- x
      ret$y <- x^2
      return(ret)
   },
   times = 500L
)

#Unit: nanoseconds
#                                                                                 #expr
# function(x) {     ret <- new.env()     ret$x <- x     ret$y <- x^2     #return(ret) }
#    function(x) {     ret <- list()     ret$x <- x     ret$y <- x^2     #return(ret) }
# min lq   mean median  uq  max neval
#   0  1 31.802      1 100  801   500
#   0  1 37.802      1 100 2902   500

and they return objects with same sizes:

object.size(res)
#464 bytes

object.size(res2)
#464 bytes

and you can always generate a list from an enviroment (list2env) and the inverse too (as.list):

L <- list(a = 1, b = 2:4, p = pi, ff = gl(3, 4, labels = LETTERS[1:3]))
e <- list2env(L)
e$ff
# [1] A A A A B B B B C C C C
#Levels: A B C

as.list(e)
#$ff
# [1] A A A A B B B B C C C C
#Levels: A B C
#
#$p
#[1] 3.141593
#
#$b
#[1] 2 3 4
#
#$a
#[1] 1

Calling an R function in a different environment

Move f into env

environment(f) <- env
f()
# [1] 4

Note: Evaluation of objects across different environments is not desirable, as you have encountered here. It's best to keep all objects that you plan to interact with one another in the same environment.

If you don't want to change the environment of f, you could put all the above into a new function.

fx <- function(f, env) {
    environment(f) <- env
    f()
}
fx(f, env)
# [1] 4

Using Functions and Environments