R equivalent of Stata local or global macros
First, as a former Stata user, let me recommend R for Stata Users. There is also this article on Macros in R. I think @Nick Cox is right that you need to learn to do things more differently. But like you (at least in this case), I often find myself starting a new task with my prior knowledge of how to do it in Stata and going from there. Sometimes I find the approaches are similar. Sometimes I can make R act like Stata when a different approach would be better (e.g., loops vs. vectorization).
I'm not sure if I will capture your question with the following, but let me try.
In Stata, it would be common to write:
global mydata "path to my data directory/"
To import the data, I would just type:
insheet using "${mydata}myfile.csv"
As a former Stata user, I want to do something similar in R. Here is what I do:
mydata <- "path to my data directory/"
To import a csv file located in this directory and create a data frame called myfile, I would use:
myfile <- read.csv(paste(mydata, "myfile.csv", sep=""))
or more efficiently...
myfile <- read.csv(paste0(mydata, "myfile.csv"))
I'm not a very efficient R user yet, so maybe others will see some flaws in this approach.
R equivalent of Stata's for-loop over local macro list of stubnames
Well, here's one way. Columns in R data frames can be accessed using their character names, so this will work:
# create sample dataset
set.seed(1) # for reproducible example
df <- data.frame(year=as.factor(rep(6:8,each=100)), #categorical variable
varX06 = rnorm(300), varX07=rnorm(300), varX08=rnorm(100),
varY06 = rnorm(300), varY07=rnorm(300), varY08=rnorm(100))
# you start here...
years <- unique(df$year)
df$varX <- unlist(lapply(years,function(yr)df[df$year==yr,paste0("varX0",yr)]))
df$varY <- unlist(lapply(years,function(yr)df[df$year==yr,paste0("varY0",yr)]))
print(head(df),digits=4)
# year varX06 varX07 varX08 varY06 varY07 varY08 varX varY
# 1 6 -0.6265 0.8937 -0.3411 -0.70757 1.1350 0.3412 -0.6265 -0.70757
# 2 6 0.1836 -1.0473 1.5024 1.97157 1.1119 1.3162 0.1836 1.97157
# 3 6 -0.8356 1.9713 0.5283 -0.09000 -0.8708 -0.9598 -0.8356 -0.09000
# 4 6 1.5953 -0.3836 0.5422 -0.01402 0.2107 -1.2056 1.5953 -0.01402
# 5 6 0.3295 1.6541 -0.1367 -1.12346 0.0694 1.5676 0.3295 -1.12346
# 6 6 -0.8205 1.5122 -1.1367 -1.34413 -1.6626 0.2253 -0.8205 -1.34413
For a given yr
, the anonymous function extracts the rows with that yr
and column named "varX0" + yr
(the result of paste0(...)
. Then lapply(...)
"applies" this function for each year, and unlist(...)
converts the returned list into a vector.
How do I create a macro for regressors in R?
Here are some alternatives. No packages are used in the first 3.
1) reformulate
fo <- reformulate(regressors, response = "income")
lm(fo, Duncan)
or you may wish to write the last line as this so that the formula that is shown in the output looks nicer:
do.call("lm", list(fo, quote(Duncan)))
in which case the Call: line of the output appears as expected, namely:
Call:
lm(formula = income ~ education + prestige, data = Duncan)
2) lm(dataframe)
lm( Duncan[c("income", regressors)] )
The Call: line of the output look like this:
Call:
lm(formula = Duncan[c("income", regressors)])
but we can make it look exactly as in the do.call
solution in (1) with this code:
fo <- formula(model.frame(income ~., Duncan[c("income", regressors)]))
do.call("lm", list(fo, quote(Duncan)))
3) dot
An alternative similar to that suggested by @jenesaisquoi in the comments is:
lm(income ~., Duncan[c("income", regressors)])
The approach discussed in (2) to the Call: output also works here.
4) fn$ Prefacing a function with fn$ enables string interpolation in its arguments. This solution is nearly identical to the desired syntax shown in the question using $ in place of @ to perform substitution and the flexible substitution could readily extend to more complex scenarios. The quote(Duncan)
in the code could be written as just Duncan
and it will still run but the Call: shown in the lm
output will look better if you use quote(Duncan)
.
library(gsubfn)
rhs <- paste(regressors, collapse = "+")
fn$lm("income ~ $rhs", quote(Duncan))
The Call: line looks almost identical to the do.call
solutions above -- only spacing and quotes differ:
Call:
lm(formula = "income ~ education+prestige", data = Duncan)
If you wanted it absolutely the same then:
fo <- fn$formula("income ~ $rhs")
do.call("lm", list(fo, quote(Duncan)))
File import and save with local macro like Stata in R
- read in all
xlsx
files in your root directory (or modify if other) - same them all in a list
df.list
- rename with assigning
names
library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
names(df.list) <- c("s2", "b2", "c2", "g2")
Using local macros in names of global macros
The 18 Programming Stata Manual explains:
"...You can mix global and local macros. Assume that local macro j
contains 7. Then, ${x`j’} expands to the contents of $x7..."
So you just need to use curly brackets {}
in your global macro:
. global test1 = 250
. local n = 1
. display $test1
250
. display ${test`n'}
250
Inheriting looping variable or local, global macros
Local macros are .... local. meaning visible only within the same interactive session, program, do-file, or (chunk of) code in a do-file editor window.
Globals are a crude solution to making stuff visible everywhere, but you must refer to them as such using $
. So in your run.do
you would need
ctitle($mode)
Passing the contents as arguments is a much better solution.
See also the help for include
.
All this is utterly basic Stata programming. To become competent as a Stata programmer, a minimal reference is https://www.stata.com/manuals/u18.pdf, which is also bundled with Stata on your system (unless your version is several years out of date).
Examples of the perils of globals in R and Stata
I also have the pleasure of teaching R to undergraduate students who have no experience with programming. The problem I found was that most examples of when globals are bad, are rather simplistic and don't really get the point across.
Instead, I try to illustrate the principle of least astonishment. I use examples where it is tricky to figure out what was going on. Here are some examples:
I ask the class to write down what they think the final value of
i
will be:i = 10
for(i in 1:5)
i = i + 1
iSome of the class guess correctly. Then I ask should you ever write code like this?
In some sense
i
is a global variable that is being changed.What does the following piece of code return:
x = 5:10
x[x=1]The problem is what exactly do we mean by
x
Does the following function return a global or local variable:
z = 0
f = function() {
if(runif(1) < 0.5)
z = 1
return(z)
}Answer: both. Again discuss why this is bad.
Related Topics
How to Create a Pie Chart with Percentage Labels Using Ggplot2
Print a Data Frame with Columns Aligned (As Displayed in R)
Dplyr . and _No Visible Binding for Global Variable '.'_ Note in Package Check
Fitting a Lognormal Distribution to Truncated Data in R
Alpha Aesthetic Shows Arrow's Skeleton Instead of Plain Shape - How to Prevent It
Condition Filter in Dplyr Based on Shiny Input
Using Facet Tags and Strip Labels Together in Ggplot2
Grouped Correlation with Dplyr (Works Only on Console)
How to Correctly 'Dput' a Fitted Linear Model (By 'Lm') to an Ascii File and Recreate It Later
R - Download Filtered Datatable
Fuzzyjoin Two Data Frames Using Data.Table
R- Plot Numbers Instead of Points
Check Whether All Elements of a List Are in Equal in R
Ggplot2: Group X Axis Discrete Values into Subgroups
Ggplot Piecharts on a Ggmap: Labels Destroy the Small Plots
Error When Exporting Dataframe to Text File in R
Is There an Alternative to "Revalue" Function from Plyr When Using Dplyr