Examples of the Perils of Globals in R and Stata

Examples of the perils of globals in R and Stata

I also have the pleasure of teaching R to undergraduate students who have no experience with programming. The problem I found was that most examples of when globals are bad, are rather simplistic and don't really get the point across.

Instead, I try to illustrate the principle of least astonishment. I use examples where it is tricky to figure out what was going on. Here are some examples:

  1. I ask the class to write down what they think the final value of i will be:

    i = 10
    for(i in 1:5)
    i = i + 1
    i

    Some of the class guess correctly. Then I ask should you ever write code like this?

    In some sense i is a global variable that is being changed.

  2. What does the following piece of code return:

    x = 5:10
    x[x=1]

    The problem is what exactly do we mean by x

  3. Does the following function return a global or local variable:

     z = 0
    f = function() {
    if(runif(1) < 0.5)
    z = 1
    return(z)
    }

    Answer: both. Again discuss why this is bad.

Stata and global variables

Something along those lines might work (using a reproducible example):

sysuse auto, clear        
global outcome "rep78"

gen graduate=.

replace graduate=1 if mpg==22 & $outcome==3
(2 real changes made)

In your example, just use

replace graduate=1 if graduate_primary==1 & $outcome==1  

would work.

Global variables in packages in R

In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.

For global variables in R see this SO post.

Edit in response to your comment:
An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:

token_information = list(token1 = "087091287129387",
token2 = "UA2329723")

and require all functions that need this information to have it as an argument:

do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)

In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:

token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)

I hope this makes things more clear.

Why is using ` -` frowned upon and how can I avoid it?

First point

<<- is NOT the operator to assign to global variable. It tries to assign the variable in the nearest parent environment. So, say, this will make confusion:

f <- function() {
a <- 2
g <- function() {
a <<- 3
}
}

then,

> a <- 1
> f()
> a # the global `a` is not affected
[1] 1

Second point

You can do that by using Reduce:

Reduce(function(a, b) {a[a==b] <- a[a==b]-1; a}, 2:6, df)

or apply

apply(df, c(1, 2), function(i) if(i >= 2) {i-1} else {i})

But

simply, this is sufficient:

ifelse(df >= 2, df-1, df)

Stata: expand by the number of variables

The difficulty you have in using locals, globals, scalars, saved results is not obvious from your question. An example is:

clear
set more off

sysuse auto
keep rep78

summarize

return list
expand r(max)

Saved results may disappear when other commands are issued, but you can save them into a local, for example, and use them later:

local rmax = r(max)
display `rmax'
expand `rmax'

Altering internal data in R package

You can use options(). Create an option with

options(myPackageRepositoryPath = "some/path")

and retrieve it

path <- getOption(myPackageRepositoryPath)

The same way as you set an option, you also can overwrite an option:

setpath<-function(path){
options(myPackageRepositoryPath = path)
}

The different ways of declaring objects in R

In some sense = and <- are equivalent, but the latter is preferred because = is also overwritten to specify default arguments (where <- will not work).

As for <<-, it is trickier and not recommended. In R, every step of execution along arbitrary code will be associated with a stack of environments--the current environment, the environment the current function was called from, etc. The operator <<- attempts to assign a value to the nearest object found in this environment hierarchy, and if none is found, assign it within the global environment. For example, below is a rudimentary adder.

f <- (function() { x <- 0; function(y) { x <<- x + y; x } })()
f(10) # 10
f(5) # 15

The function f has an environment which has a parent environment which has x. Using <<-, we can access that x, whereas if we had <-, the result would have been y every time instead of keeping track of the sum. The reason for this is that <- would have created a copy of x in the local scope, and it would always be 0 since the value was copied from the parent environment.

For further information about these intricacies, you can also look at the relevant R documentation.

Understanding how to pass macro arguments to a program in Stata

Your args statement assigns only the first argument supplied to the program to a local macro; if there are other arguments they are ignored.

The essence of the matter is whether double quotes are used to bind what is supplied into one argument.

Whether you supply an argument as a global or a local is immaterial: globals and locals mentioned on the command line are evaluated before the program even runs and are not seen as such; only their contents are passed to the program.

Define this simpler program and run through the possibilities:

program showfirstarg 
args first
di "`first'"
end

global G "A B C D E"
local L "A B C D E"

showfirstarg $G
showfirstarg "$G"
showfirstarg `L'
showfirstarg "`L'"

Results in turn:

. showfirstarg $G 
A

. showfirstarg "$G"
A B C D E

. showfirstarg `L'
A

. showfirstarg "`L'"
A B C D E

Can Functional Globals be used to share data between VIs running on different targets?

Functional Global Variables only work within a single context. Contexts only exist on one target (e.g. My Computer, a Compact RIO, RT device, etc).

FGVs are really just a neat way to create a shared value in a program using a feature of shift registers -- if you don't initialize the shift register on the diagram then it will be initialized with whatever value it hat last. A VI that is running on two targets is not the same VI -- it's two copies of it. So you have two copies of your FGV, which is why the data you want to share isn't being shared.

To communicate between two targets, I suggest you check out shared variables, TCP, or network streams. There are a lot of other options, but those are my favorite and easiest to set up in difference scenarios.



Related Topics



Leave a reply



Submit