Understanding Lexical Scoping in R

Understanding lexical scoping in R

Normally when discussed in the context of R lexical scoping means that free variables in a function (i.e. variables that are used in a function but not defined in the function) are looked up in the parent environment of the function, as opposed to the environment of the caller (also referred to as the parent frame) but there are no free variables in with.default so the example does not illustrate a violation of lexical scoping in that sense.

For example, this illustrates lexical scoping:

x <- 1
f <- function() x
g <- function() { x <- 0; f() }
g() # 1

The answer is 1 because 1 is defined in the environment that f is defined in. Had R used dynamic scoping rather than lexical scoping the answer would have been 0 (using the environment of the caller). We can illlustrate how R can emulate dynamic scoping like this:

f <- function() eval.parent(quote(x))
g() # 0

ADDED:

In a comment below @hadley suggested that the authors may have been referring to the fact that the second actual argument to with.default is not evaluated lexically and this interpretation seems likely. Instead of being evaluated relative to the surrounding lexical environment the second actual argument of with.default is read into the with.default function as an expression using substitute and then evaluated relative to the first argument using eval. There is some question of what the definition of lexical scoping ought to be as it is rarely defined even when extensively discussed but typical discussions in relation to R refer to it as the treatment of free variables. See for example Gentleman & Ihaka.

lexical scoping and environments in R

You are confusing the "calling environment" with the "enclosing environment." Check out these terms in Hadley's book "Advanced R."

http://adv-r.had.co.nz/Environments.html

The "calling environment" is the environment from which a function was called, and is returned by the unfortunately-named function parent.frame. However, the calling environment is not used for lexical scoping.

The "enclosing environment" is the environment in which a function was created and is used for lexical scoping. You have created both func1 and func2 in the global environment. Therefore, the global environment is the "enclosing environment" for both functions and will be used for lexical scoping regardless of the calling environment!!

If you want func2 to use the execution environment of func1 for lexical scoping, you have (at least) two options. You can create func2 within func1

func1 <- function(vec) {

func2 <- function(foos) {
for (foo in foos)
print(eval(parse(text = foo)))
return(foos)
}

text3_obj <- 'text3'
vec <- c(vec, c('text3_obj'))
return(func2(vec))
}

then your test works as expected:

> text1_obj <- 'text1'
> text2_obj <- 'text2'
> func1(c('text1_obj', 'text2_obj'))
[1] "text1"
[1] "text2"
[1] "text3"
[1] "text1_obj" "text2_obj" "text3_obj"

Alternatively, you can create func2 and reassign it's "enclosing environment" from within func1.

func2 <- function(foos) {
for (foo in foos)
print(eval(parse(text = foo)))
return(foos)
}

func1 <- function(vec) {
text3_obj <- 'text3'
vec <- c(vec, c('text3_obj'))
environment(func2) <- environment()
return(func2(vec))
}

This will also work as expected.

An interesting tidbit I found while writing my demonstration code... It appears that when you re-assign the environment of func2 from within func1, R creates a copy of func2 in the execution environment of func1. By the time you get back to the console, the enclosing environment of the original func2 remains unchanged. Witness:

a = function() {
print(identical(environment(a), globalenv()))
}

b = function(x) {
environment(a) <- environment()
a()
}

Test a() and b():

> a()
[1] TRUE
> b()
[1] FALSE
> a()
[1] TRUE
>

This was not what I expected, but seems like really excellent behavior on the part of R. If this were not the case, the enclosing environment of a() would have been permanently changed to the execution environment of b(), and FALSE should have been returned the second time a() is called.

If fact, it turns out you can force the change to the original a() in the global environment using <<-:

a = function() {
print(identical(environment(a), globalenv()))
}

b = function(x) {
# set a variable in the execution environment of b() for use later...
montePython = "I'm not dead yet!!"
# change the enclosing environment of a() in the global environment
# rather than making a local copy of a() in b()'s execution environment.
environment(a) <<- environment()
a()
}

Test a() and b():

> a()
[1] TRUE
> b()
[1] FALSE
> a()
[1] FALSE
>

Interestingly, this means that the (normally temporary) execution environment of b() persists in memory even after b() terminates, because a() still references the environment, so it can't be garbage collected. Witness:

> environment(a)$montePython
[1] "I'm not dead yet!!"

Understanding scoping of nested functions

R uses lexical scoping which means that if a function needs to reference an object not defined in that function it looks at the environment in which the function was defined, not the caller. In the question g is defined in the global environment so that is where g looks for c.

Also note that in R we would not call the functions in the question nested functions. Rather what is nested is the calls, not the functions. In (3) below we show nested functions.

1) We can reset a function's environment in which case it will think it was defined in that environment.

g <- function(a,b){       # help function
a^2 + b^2 - c
}

f <- function(a,b,c,d){ # main function
environment(g) <- environment()
g(a,b)
}

f(1, 2, 3, 4)
## [1] 2

2) Another possibility is to explicitly tell it which environment to search using envir$c (where envir is the desired environment) or get("c", envir) or with(envir, c) . envir$c will look into envir. The other two will look there and if not found will look into ancestor environments. (Each environment has a parent or the emptyenv(). This is distinct from the call stack.)

g <- function(a, b, envir = parent.frame()){       # help function
a^2 + b^2 - envir$c
}

f <- function(a,b,c,d){ # main function
g(a,b)
}

f(1, 2, 3, 4)
## [1] 2

3) We can nest the functions so that g is defined in f.

f <- function(a,b,c,d){   # main function
g <- function(a,b){ # help function
a^2 + b^2 - c
}
g(a,b)
}

f(1, 2, 3, 4)
## [1] 2

4) Of course you could just pass c and avoid all these problems.

g <- function(a, b, c) { 
a^2 + b^2 - c
}

f <- function(a, b, c, d) { # main function
g(a, b, c)
}

f(1, 2, 3, 4)
## [1] 2

About lexical scoping in R

OP seems to be looking for clarification about environments.

In R, every function[1] has an enclosing environment. This is the collection of objects that it knows about, in addition to those that are passed in as its arguments, or that it creates in its code.

When you create a function at the prompt, its environment is the global environment. This is just the collection of objects in your workspace, which you can see by typing ls(). For example, if your workspace contains a data frame Df, you could create a function like the following:

showDfRows <- function()
{
cat("The number of rows in Df is: ", nrow(Df, "\n")
return(NULL)
}

Your function knows about Df even though you didn't pass it in as an argument; it exists in the funtion's environment. Environments can be nested, which is how things like package namespaces work. You can, for example do lm(y ~ x, data=Df) to fit a regression, even though your workspace doesn't contain any object called lm. This is because the global environment's chain of parents includes the stats package, which is where the lm function lives.[2]

When functions are created inside another function, their enclosing environment is the evaluation frame of their parent function. This means that the child function can access all the objects known to the parent. For example:

f <- function(x)
{
g <- function()
{
cat("The value of x is ", x, "\n")
}
return(NULL)
}

Notice that g doesn't contain any object called x, nor are any of its arguments named x. However, it all still works, because it will retrieve x from the evaluation frame of its parent f.

This is the trick that the code up above is using. When you run open_account, it creates an evaluation frame in which to execute its code. open_account then creates 3 functions, deposit, withdraw and balance. Each of these 3 has as its enclosing environment the evaluation frame of open_account. In this evaluation frame there is a variable called total, whose value was passed in by you, and which will be manipulated by deposit, withdraw and balance.

When open_account completes, it returns a list. If this was a regular function, its evaluation frame would now be disposed of by R. In this case, however, R can see that the returned list contains functions that need to use that evaluation frame; so the frame continues to stay in existence.

So, why don't Ross' and Robert's accounts clash with each other? Every time you execute open_account, R creates a new evaluation frame. The frames from opening Ross' and Robert's accounts are completely separate, just like, if you run lm(y ~ x, data=Df), there will be a separate frame to if you run lm(y ~ x, data=Df2). Each time open_account returns, it will bring with it a new environment in which to store the balance just created. (It will also contain new copies of the deposit, withdraw and balance functions, but generally we can afford to ignore the memory used for this.)

[1] technically every closure, but let's not muddy things

[2] again, there's a technical distinction between namespaces and environments but it isn't important here

Lexical scoping and the - operator in R

The operators ‘<<-’ and ‘->>’ are normally only used in functions,
and cause a search to made through parent environments for an
existing definition of the variable being assigned. If such a
variable is found (and its binding is not locked) then its value
is redefined, otherwise assignment takes place in the global
environment

variable at the global level

z <- 10

Does not modify the global value of z

myfun <- function(x){
z <- x
print(z)
}

modify the value of z inside the myfun but don't modify z at the global level.

    myfun0 <- function(x){
z <- x
myfun1 <- function(y){
z <<- (y+1)
}

myfun1(x)
print(z)
}

Modify the z in the global environment

myfunG <- function(x){
z <<- x
print(" z in the global envronment is modified")
}

see this post as well.

Why does my R run in dynamic scoping? Shouldn't it be lexical?

Your code has assigned y within the function itself, which is looked up before the y in the global environment.

From this excellent article (http://www.r-bloggers.com/environments-in-r/): "When a function is evaluated, R looks in a series of environments for any variables in scope. The evaluation environment is first, then the function’s enclosing environment, which will be the global environment for functions defined in the workspace."

In simpler language specific to your case: when you call the variable "y", R looks for "y" in the function's environment, and if it fails to find one, then it goes to your workspace. An example to illustrate:

y <- 10

f <- function(x) {
y^3
}

f(3)

Will produce output:

> f(3)
[1] 1000

Formal Arguments Evaluation and Lexical Scoping in R

You have specified an argument y for the function, but not providing any value when asked for a value in return. So this will work

f1(x = 3, y)
[1] 10

Here it takes y your defined variable as an input for the second argument which incidentally is also named y and returns a value.

even this will also work. As you have defined a default value to this function

y1 <- 4
f1 <- function(x = 2, y= y1) {
x*2 + y
}
f1(x=3)
#> [1] 10
f1(x = 3, 5)
#> [1] 11

Created on 2021-05-03 by the reprex package (v2.0.0)

If you want to evaluate any function without giving any value for any argument, you have to define that in function itself.

Why can R not find the value of an argument through lexical scoping?

Note that R will only throw the error when you go to use the variable. if you had

f1 <- function(x = 2, y) {
x*2 + 5
}

f1(x = 3)
# [1] 11

everything would be fine. That's because the parameter is a "promise" which isn't resolved till you actually use it. This allows you to do things like

f1 <- function(x = 2, y=x+5) {
x*2 + y
}

f1(x = 3)
# [1] 14

Where the y value will actually use the value of x that's passed to the function when the promise is evaluated. Furthermore you can also do

f1 <- function(x = 2, y=z+2) {
z <- x + 10
x*2 + y
}

f1(x = 3)
[1] 21

Where y is able to take the value of z that didn't even exist when the function was called. Again this is because the parameter values are promises and are only evaluated when they are actually used. They have access to all the values in the environment when they are evaluated. But note that this only works because default parameter values are evaluated in the context of the function body. This is different than when you pass in a value to a function. In that case the value is evaluated in the calling environment, not the local function body. So you can't do

f1(x = 3, y=z+2)
# Error in f1(x = 3, y = z + 2) : object 'z' not found

The the reason you get the error in your first function is that a value for y does not exist when you try to use it in x*2 + y. Since you've defined y as a parameter, it is no longer a "free" variable and will not be looked up in parent scope. You don't get an error in your second function because you've re-bound the y variable to a local function variable so you are never using the parameter value at all.

If you ran

f1 <- function(x = 2, y) {
y <- 4
x*2 + y
}

f1(x = 3, y=200)
# [1] 10

The 2000 basically disappears. You no longer have access to that value after you reassign y. R does not check if a variable exists already before redefining so there is nothing that will try to evaluate the promise value y of the function parameter.

Arguments will act like local variables once the promise has been evaluated.

What is lexical scope?

I understand them through examples. :)

First, lexical scope (also called static scope), in C-like syntax:

void fun()
{
int x = 5;

void fun2()
{
printf("%d", x);
}
}

Every inner level can access its outer levels.

There is another way, called dynamic scope used by the first implementation of Lisp, again in a C-like syntax:

void fun()
{
printf("%d", x);
}

void dummy1()
{
int x = 5;

fun();
}

void dummy2()
{
int x = 10;

fun();
}

Here fun can either access x in dummy1 or dummy2, or any x in any function that call fun with x declared in it.

dummy1();

will print 5,

dummy2();

will print 10.

The first one is called static because it can be deduced at compile-time, and the second is called dynamic because the outer scope is dynamic and depends on the chain call of the functions.

I find static scoping easier for the eye. Most languages went this way eventually, even Lisp (can do both, right?). Dynamic scoping is like passing references of all variables to the called function.

As an example of why the compiler can not deduce the outer dynamic scope of a function, consider our last example. If we write something like this:

if(/* some condition */)
dummy1();
else
dummy2();

The call chain depends on a run time condition. If it is true, then the call chain looks like:

dummy1 --> fun()

If the condition is false:

dummy2 --> fun()

The outer scope of fun in both cases is the caller plus the caller of the caller and so on.

Just to mention that the C language does not allow nested functions nor dynamic scoping.

R: Lexical Scoping issue when creating a function with ellipsis argument

You can use quosures from rlang to capture the arguments in ... then unquote splice them into a select call:

library(tidyverse)
library(rlang)

qft <- function(data, ...){
args <- enquos(...)
vars <- select(data, !!!args)
ft <- data.frame(table(vars))
ft[ft$Freq != 0, ]
}

qft(mtcars, cyl, gear)
# cyl gear Freq
#1 4 3 1
#2 6 3 2
#3 8 3 12
#4 4 4 8
#5 6 4 4
#7 4 5 2
#8 6 5 1
#9 8 5 2


Related Topics



Leave a reply



Submit