Using functions and environments
The simplest solution is to use the environment when referencing the object:
y <- new.env()
y$x <- 1
f <- function(env,z) {
env$x+z
}
f(y,z=1)
You would need to assign z
to your environment as well.
y <- new.env()
with(y, x <- 1)
f <- function(env,z) {
assign("z", z, envir=env)
with(env, x+z)
}
f(y,z=1)
One other option would be to attach
your environment so that the variables can now be used directly.
y <- new.env()
with(y, x <- 1)
f <- function(env,z) {
attach(env)
y <- x + z
detach(env)
y
}
f(y,z=1)
This latter solution is powerful because it means you can use any object from any attached environment within your new environment, but it also means that you need to be very careful about what has been assigned globally.
Edit:
This is interesting, and I don't entirely understand the behavior (i.e. why z
is not in the scope of the with
call). It has something to do with the creation of the environment originally that is causing it to be outside the scope of the function, because this version works:
f <- function(z) {
y <- new.env()
with(y, x <- 1)
with(y, x+z)
}
f(y,z=1)
How to use the R environment and the globalenv() function
Your card deck is stored in a vector deck
in your Global Environment.
deal <- function(){
card <- deck[1,]
assign("deck", deck[-1,], envir = globalenv())
card
}
Each function call creates it's own environment, an object assigned inside a function "lives" just inside of it. That's why you don't "see" a vector named card
in your Global Environment (unless you created one before, but this vector is uneffected by deal
functions card <- deck[1,]
statement).
So assign("deck", deck[-1])
(without the envir
argument) would be the same as
deal <- function(){
card <- deck[1,]
deck <- deck[-1,]
card
}
but this won't change your deck
outside the function. The vector deck
inside the function just exists inside the function. To change the deck
outside the function, you have to tell R
where to change it. So that's why assign("deck", deck[-1,], envir = globalenv())
is used.
So let's start over with your function deal
:
card <- deck[1,]
assigns the first element of deck
to card
. But wait! deck
doesn't exists inside the function? So how is this possible? If the object isn't found inside the function, R
looks one level up, in your case most likely the Global Environment. So there R finds an object/vector named deck
and does the assignment. Now we have an object/vector named card
that exists inside the function.
For further understanding, take a look at Chapter 6: Functions in Advanced R.
Distinct enclosing environment, function environment, etc. in R
TLDR:
- indeed, you can change the enclosing environment. Hadley was probably talking about packaged functions.
- the enclosing and the binding environment. You were correct.
- that's the execution environment. It only exists for the time the function runs.
Function environments
You have to distinguish 4 different environments when talking about a function:
- the binding environment is the environment where the function is found (i.e. where its name exists). This is where the actual binding of an object to its name is done.
find()
gives you the binding environment. - the enclosing environment is the environment where the function is originally created. This is not necessarily the same as the binding environment (see examples below).
environment()
gives you the enclosing environment. - the local environment is the environment within the function. You call that the execution environment.
- the parent frame or calling environment is the environment from where the function was called.
Why does this matter
Every environment has a specific function:
- the binding environment is the environment where you find the function.
- the local environment is the first environment where R looks for objects.
- the general rule is: if R doesn't find an object in the local environment, it then looks in the enclosing environment and so on. The last enclosing environment is always
emptyenv()
. - the parent frame is where R looks for the value of the objects passed as
arguments.
You can change the enclosing environment
Indeed, you can change the enclosing environment. It is the enclosing environment of a function from a package you cannot change. In that case you don't change the enclosing environment, you actually create a copy in the new environment:
> ls()
character(0)
> environment(sd)
<environment: namespace:stats>
> environment(sd) <- globalenv()
> environment(sd)
<environment: R_GlobalEnv>
> ls()
[1] "sd"
> find("sd")
[1] ".GlobalEnv" "package:stats" # two functions sd now
> rm(sd)
> environment(sd)
<environment: namespace:stats>
In this case, the second sd
has the global environment as the enclosing and binding environment, but the original sd
is still found inside the package environment, and its enclosing environment is still the namespace of that package
The confusion might arise when you do the following:
> f <- sd
> environment(f)
<environment: namespace:stats>
> find("f")
[1] ".GlobalEnv"
What happens here? The enclosing environment is still the namespace ''stats''. That's where the function is created. However, the binding environment is now the global environment. That's where the name "f" is bound to the object.
We can change the enclosing environment to a new environment e
. If you check now, the enclosing environment becomes e
, but e
itself is empty. f
is still bound in the global environment.
> e <- new.env()
> e
<environment: 0x000000001852e0a8>
> environment(f) <- e
> find("f")
[1] ".GlobalEnv"
> environment(f)
<environment: 0x000000001852e0a8>
> ls(e)
character(0)
The enclosing environment of e
is the global environment. So f
still works as if its enclosure was the global environment. The environment e
is enclosed in it, so if something isn't found in e
, the function looks in the global environment and so on.
But because e
is an environment, R calls that a parent environment.
> parent.env(e)
<environment: R_GlobalEnv>
> f(1:3)
[1] 1
Namespaces and package environments
This principle is also the "trick" packages use:
- the function is created in the namespace. This is an environment that is enclosed by the namespaces of other imported packages, and eventually the global environment.
- the binding for the function is created in the package environment. This is an environment that encloses the global environment and possible other packages.
The reason for this is simple: objects can only be found inside the environment you are in, or in its enclosing environments.
- a function must be able to find other functions(objects), so the local environment must be enclosed by possibly the namespaces of other packages it imports, the base package and lastly the global environment.
- a function must be findable from within the global environment. Hence the binding (i.e. the name of the function) must be in an environment that is enclosed by the global environment. This is the package environment (NOT the namespace!)
An illustration:
Now suppose you make an environment with the empty environment as a parent. If you use this as an enclosing environment for a function, nothing works any longer. Because now you circumvent all the package environments, so you can't find a single function any more.
> orphan <- new.env(parent = emptyenv())
> environment(f) <- orphan
> f(1:3)
Error in sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), :
could not find function "sqrt"
The parent frame
This is where it gets interesting. The parent frame or calling environment, is the environment where the values passed as arguments are looked up. But that parent frame can be the local environment of another function. In this case R looks first in that local environment of that other function, and then in the enclosing environment of the calling function, and so all the way up to the global environment, the environments of the attached packages until it reaches the empty environment. That's where the "object not found" bug sleeps.
Call function from the global environment with implicit dataframe variables (from the calling env?) inside dplyr::summarise or mutate
Up front, I'm generally against writing functions that defeat functional reproducibility, having spent too much time troubleshooting functions that change behavior based on something not passed to them.
However, try this:
method_1 <- list(
any_vs_four_gears = function(data = cur_data()) with(data, any(vs == 1 & gear == 4)),
any_am_high_hp = function(data = cur_data()) with(data, any(am == 1 & hp > 170)),
all_combined = function(data = cur_data()) with(data, all(any_vs_four_gears, any_am_high_hp))
)
mtcars %>%
group_by(carb) %>%
summarise(
any_vs_four_gears = method_1$any_vs_four_gears()
any_am_high_hp = method_1$any_am_high_hp(),
all_combined = method_1$all_combined()
)
# # A tibble: 6 x 4
# carb any_vs_four_gears any_am_high_hp all_combined
# <dbl> <lgl> <lgl> <lgl>
# 1 1 TRUE FALSE FALSE
# 2 2 TRUE FALSE FALSE
# 3 3 FALSE FALSE FALSE
# 4 4 TRUE TRUE TRUE
# 5 6 FALSE TRUE FALSE
# 6 8 FALSE TRUE FALSE
This uses the cur_data()
pronoun/function found in dplyr
-pipe environments, adds just a little surrounding code (with(data, { ... })
, so {
-expression-friendly), and works "as is".
The errors are not difficult to interpret:
mtcars %>%
select(-vs) %>% # intentionally setting up an error
group_by(carb) %>%
summarise(
any_vs_four_gears = method_1$any_vs_four_gears()
any_am_high_hp = method_1$any_am_high_hp(),
all_combined = method_1$all_combined()
)
# Error: Problem with `summarise()` column `any_vs_four_gears`.
# i `any_vs_four_gears = method_1$any_vs_four_gears()`.
# x object 'vs' not found
# i The error occurred in group 1: carb = 1.
# Run `rlang::last_error()` to see where the error occurred.
Return list vs environment from an R function
Although similars, there're differences in return a list and a enviroment.
From Advanced R:
Generally, an environment is similar to a list, with four important exceptions:
Every name in an environment is unique.
The names in an environment are not ordered (i.e., it doesn’t make sense to ask what the first element of an environment is).
An environment has a parent.
Environments have reference semantics.
More technically, an environment is made up of two components, the frame, which contains the name-object bindings (and behaves much like a named list), and the parent environment. Unfortunately “frame” is used inconsistently in R. For example, parent.frame() doesn’t give you the parent frame of an environment. Instead, it gives you the calling environment. This is discussed in more detail in calling environments.
From the help:
help(new.env)
Environments consist of a frame, or collection of named objects, and a pointer to an enclosing environment. The most common example is the frame of variables local to a function call; its enclosure is the environment where the function was defined (unless changed subsequently). The enclosing environment is distinguished from the parent frame: the latter (returned by parent.frame) refers to the environment of the caller of a function. Since confusion is so easy, it is best never to use ‘parent’ in connection with an environment (despite the presence of the function parent.env).
from the function's documentation:
e1 <- new.env(parent = baseenv()) # this one has enclosure package:base.
e2 <- new.env(parent = e1)
assign("a", 3, envir = e1)
ls(e1)
#[1] "a"
However ls
will gives the environments created:
ls()
#[1] "e1" "e2"
And you can access your enviroment objects just like a list:
e1$a
#[1] 3
Playing with your functions:
f1 <- function(x) {
ret <- new.env()
ret$x <- x
ret$y <- x^2
return(ret)
}
res <- f1(2)
res
#<environment: 0x0000021d55a8a3e8>
res$y
#[1] 4
f2 <- function(x) {
ret <- list()
ret$x <- x
ret$y <- x^2
return(ret)
res2 <- f(2)
res2
#$x
#[1] 2
#$y
#[1] 4
res2$y
#[1] 4
Their performance is quite similar, according to microbenchmarking
:
microbenchmark::microbenchmark(
function(x) {
ret <- new.env()
ret$x <- x
ret$y <- x^2
return(ret)
},
function(x) {
ret <- list()
ret$x <- x
ret$y <- x^2
return(ret)
},
times = 500L
)
#Unit: nanoseconds
# #expr
# function(x) { ret <- new.env() ret$x <- x ret$y <- x^2 #return(ret) }
# function(x) { ret <- list() ret$x <- x ret$y <- x^2 #return(ret) }
# min lq mean median uq max neval
# 0 1 31.802 1 100 801 500
# 0 1 37.802 1 100 2902 500
and they return objects with same sizes:
object.size(res)
#464 bytes
object.size(res2)
#464 bytes
and you can always generate a list from an enviroment (list2env
) and the inverse too (as.list
):
L <- list(a = 1, b = 2:4, p = pi, ff = gl(3, 4, labels = LETTERS[1:3]))
e <- list2env(L)
e$ff
# [1] A A A A B B B B C C C C
#Levels: A B C
as.list(e)
#$ff
# [1] A A A A B B B B C C C C
#Levels: A B C
#
#$p
#[1] 3.141593
#
#$b
#[1] 2 3 4
#
#$a
#[1] 1
Calling an R function in a different environment
Move f
into env
environment(f) <- env
f()
# [1] 4
Note: Evaluation of objects across different environments is not desirable, as you have encountered here. It's best to keep all objects that you plan to interact with one another in the same environment.
If you don't want to change the environment of f
, you could put all the above into a new function.
fx <- function(f, env) {
environment(f) <- env
f()
}
fx(f, env)
# [1] 4
Related Topics
Changing Styles When Selecting and Deselecting Multiple Polygons with Leaflet/Shiny
Downloading Files from Ftp with R
Determine Season from Date Using Lubridate in R
Remove Weekend Data in a Dataframe
List and Description of All Packages in Cran from Within R
How to Add Gaussian Curve to Histogram Created with Qplot
R, Conditionally Remove Duplicate Rows
Generate Random Integers Between Two Values with a Given Probability Using R
Linear Model with 'Lm': How to Get Prediction Variance of Sum of Predicted Values
How to Apply a Gradient Fill to a Geom_Rect Object in Ggplot2
Shiny Dashboard Mainpanel Height Issue
Simple R 3D Interpolation/Surface Plot
How to Access the Name of the Variable Assigned to the Result of a Function Within the Function
Using Pivot_Longer with Multiple Paired Columns in the Wide Dataset
How to Add Axis Text in This Negative and Positive Bars Differently Using Ggplot2