Examples of the perils of globals in R and Stata
I also have the pleasure of teaching R to undergraduate students who have no experience with programming. The problem I found was that most examples of when globals are bad, are rather simplistic and don't really get the point across.
Instead, I try to illustrate the principle of least astonishment. I use examples where it is tricky to figure out what was going on. Here are some examples:
I ask the class to write down what they think the final value of
i
will be:i = 10
for(i in 1:5)
i = i + 1
iSome of the class guess correctly. Then I ask should you ever write code like this?
In some sense
i
is a global variable that is being changed.What does the following piece of code return:
x = 5:10
x[x=1]The problem is what exactly do we mean by
x
Does the following function return a global or local variable:
z = 0
f = function() {
if(runif(1) < 0.5)
z = 1
return(z)
}Answer: both. Again discuss why this is bad.
Stata and global variables
Something along those lines might work (using a reproducible example):
sysuse auto, clear
global outcome "rep78"
gen graduate=.
replace graduate=1 if mpg==22 & $outcome==3
(2 real changes made)
In your example, just use
replace graduate=1 if graduate_primary==1 & $outcome==1
would work.
Global variables in packages in R
In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.
For global variables in R see this SO post.
Edit in response to your comment:
An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:
token_information = list(token1 = "087091287129387",
token2 = "UA2329723")
and require all functions that need this information to have it as an argument:
do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)
In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:
token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)
I hope this makes things more clear.
Why is using ` -` frowned upon and how can I avoid it?
First point
<<-
is NOT the operator to assign to global variable. It tries to assign the variable in the nearest parent environment. So, say, this will make confusion:
f <- function() {
a <- 2
g <- function() {
a <<- 3
}
}
then,
> a <- 1
> f()
> a # the global `a` is not affected
[1] 1
Second point
You can do that by using Reduce
:
Reduce(function(a, b) {a[a==b] <- a[a==b]-1; a}, 2:6, df)
or apply
apply(df, c(1, 2), function(i) if(i >= 2) {i-1} else {i})
But
simply, this is sufficient:
ifelse(df >= 2, df-1, df)
Stata: expand by the number of variables
The difficulty you have in using local
s, global
s, scalar
s, saved results is not obvious from your question. An example is:
clear
set more off
sysuse auto
keep rep78
summarize
return list
expand r(max)
Saved results may disappear when other commands are issued, but you can save them into a local
, for example, and use them later:
local rmax = r(max)
display `rmax'
expand `rmax'
Altering internal data in R package
You can use options()
. Create an option with
options(myPackageRepositoryPath = "some/path")
and retrieve it
path <- getOption(myPackageRepositoryPath)
The same way as you set an option, you also can overwrite an option:
setpath<-function(path){
options(myPackageRepositoryPath = path)
}
The different ways of declaring objects in R
In some sense =
and <-
are equivalent, but the latter is preferred because =
is also overwritten to specify default arguments (where <-
will not work).
As for <<-
, it is trickier and not recommended. In R, every step of execution along arbitrary code will be associated with a stack of environments--the current environment, the environment the current function was called from, etc. The operator <<-
attempts to assign a value to the nearest object found in this environment hierarchy, and if none is found, assign it within the global environment. For example, below is a rudimentary adder.
f <- (function() { x <- 0; function(y) { x <<- x + y; x } })()
f(10) # 10
f(5) # 15
The function f
has an environment which has a parent environment which has x
. Using <<-
, we can access that x
, whereas if we had <-
, the result would have been y
every time instead of keeping track of the sum. The reason for this is that <-
would have created a copy of x
in the local scope, and it would always be 0
since the value was copied from the parent environment.
For further information about these intricacies, you can also look at the relevant R documentation.
Understanding how to pass macro arguments to a program in Stata
Your args
statement assigns only the first argument supplied to the program to a local macro; if there are other arguments they are ignored.
The essence of the matter is whether double quotes are used to bind what is supplied into one argument.
Whether you supply an argument as a global or a local is immaterial: globals and locals mentioned on the command line are evaluated before the program even runs and are not seen as such; only their contents are passed to the program.
Define this simpler program and run through the possibilities:
program showfirstarg
args first
di "`first'"
end
global G "A B C D E"
local L "A B C D E"
showfirstarg $G
showfirstarg "$G"
showfirstarg `L'
showfirstarg "`L'"
Results in turn:
. showfirstarg $G
A
. showfirstarg "$G"
A B C D E
. showfirstarg `L'
A
. showfirstarg "`L'"
A B C D E
Can Functional Globals be used to share data between VIs running on different targets?
Functional Global Variables only work within a single context. Contexts only exist on one target (e.g. My Computer, a Compact RIO, RT device, etc).
FGVs are really just a neat way to create a shared value in a program using a feature of shift registers -- if you don't initialize the shift register on the diagram then it will be initialized with whatever value it hat last. A VI that is running on two targets is not the same VI -- it's two copies of it. So you have two copies of your FGV, which is why the data you want to share isn't being shared.
To communicate between two targets, I suggest you check out shared variables, TCP, or network streams. There are a lot of other options, but those are my favorite and easiest to set up in difference scenarios.
Related Topics
Use Trycatch Skip to Next Value of Loop Upon Error
How to Produce Stacked Bars Within Grouped Barchart in R
How to Change the Formatting of Numbers on an Axis with Ggplot
Create Categories by Comparing a Numeric Column with a Fixed Value
Operator == Inconsistent in Logical Columns in Data.Table
How to Get a Reversed, Log10 Scale in Ggplot2
Why Apply() Returns a Transposed Xts Matrix
Drawing Pyramid Plot Using R and Ggplot2
Using R to List All Files with a Specified Extension
Select Every Other Element from a Vector
Format Number as Fixed Width, with Leading Zeros
Perform a Semi-Join with Data.Table
How to Deal with "'Somefunction' Is Not an Exported Object from 'Namespace:Somepackage'" Error
How to Calculate Combination and Permutation in R
Unicode Characters in Ggplot2 PDF Output
Why and Where Are \N Newline Characters Getting Introduced to C()