How to Detect Free Variable Names in R Functions

How to detect free variable names in R functions

The codetools package has functions for this purpose, eg findGlobals

findGlobals(f, merge=FALSE)[['variables']]
# [1] "x"

if we redefine the function to have a named argument x then no variables are returned.

f2 <- function(x){
  x+1
}
findGlobals(f2, merge=FALSE)[['variables']]
# character(0)

How to read a list of variable names and substitute them in a function in R?

You can use as.formula to convert text to valid R formulas. (But as you are discovering you cannot use R formulas as macros.) I also used just the varlist items as the loop variables rather than sequential integers.

Dataset <- iris
varlist <- data.frame("Sepal.Length", "Sepal.Width", "Petal.Length")
for (i in varlist ){ form <- as.formula( paste( i, "~ Petal.Width"))
     model <- lm(form, data = Dataset)
     printing <- data.frame(i = model$coefficients['Petal.Width'])
     write.csv(printing,paste0("~/Desktop/",i,"PetalWidthSlope.csv"))
    }

I now have 3 more files on my Desktop:

Sample Image

R - accessing variable name inside a function

Doing it the way you do the variable names get lost. But as workaround you could name vector elements before you call the function:

names(filesVector) <- c("file1", "file2")

Now you should be able to access these inside the function simply with names(filesVector) or names(filesVector[1]).

How do estimation commands find variable names in formulas in R?

The tricky thing is that R's lexical scoping searches in enclosing environments,
which can be confusing during calls because the caller environments can each have enclosing environments and things get confusing pretty quickly.

I'll be using the rlang package to debug this scenario.

First, if you defined expo in the global environment,
then that will be its enclosing environment:

expo <- function(x, theta) {
  x*wt^theta
}

rlang::get_env(expo)
# <environment: R_GlobalEnv>

So when you call it, R will first search for variables in the function's call
(not caller!)
environment,
and then in the enclosing environment (global environment here).

I don't know what nls does exactly,
but I would have assumed that it creates an environment from the data you provide and evaluates the formula there.
However, it seems the environment it creates only contains the variables it can explicitly see in the formula,
something I found with:

expo <- function(x, theta) {
  cat("caller: ")
  print(ls(rlang::caller_env()))
  cat("enclosing: ")
  print(ls(rlang::env_parent(rlang::current_env())))
}

nls(mpg ~ phi + expo(qsec, theta),
    data = mtcars,
    start = c('phi' = -2, 'theta' = 1))
# caller: [1] "mpg"   "phi"   "qsec"  "theta"
# enclosing: [1] "expo"    
# Error ...

As we can see, the caller environment of expo contains the variables we can identify in the formula,
and its enclosing environment only contains the definition of expo
(the global environment).
This unfortunately means that you can't even use something like eval.parent inside expo,
because that environment won't have all variables from data.

If you still want to work around it,
you could modify expo's enclosing environment with your data before calling nls,
something like:

expo <- function(x, theta) {
  x*wt^theta
}

environment(expo) <- list2env(as.list(mtcars))

nls(mpg ~ phi + expo(qsec, theta),
    data = mtcars,
    start = c('phi' = -2, 'theta' = 1))
# Error ... number of iterations exceeded maximum of 50

Providing data and variable names in a function in R

This seems like a very unusual way to write an R function, but you could do

my_func <- function(data, var_mileage, var_volume, var_weight){
  
  eval(substitute({
    var_mileage_km_l <- 0.43 * var_mileage
    var_volume_l <- 0.016 * var_volume
    var_weight_kg <- 0.45 * var_weight    
    
    m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
    
    summary(m)
  }), envir = data)
}

The substitute() injects the symbols you pass as the column names into the expression. Then you can evaluate it in the context of the data.frame.

Alternatively you could do something like

my_func <- function(data, var_mileage, var_volume, var_weight){
  
  var_mileage <- eval(substitute(var_mileage), data)
  var_volume <- eval(substitute(var_volume), data)
  var_weight <- eval(substitute(var_weight), data)
  
  var_mileage_km_l <- 0.43 * var_mileage
  var_volume_l <- 0.016 * var_volume
  var_weight_kg <- 0.45 * var_weight
    
  m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
  
  summary(m)
}

Or one other common trick is to turn the column names as strings.

my_func <- function(data, var_mileage, var_volume, var_weight){
   
  var_mileage_km_l <- 0.43 * data[[var_mileage]]
  var_volume_l <- 0.016 * data[[var_volume]]
  var_weight_kg <- 0.45 * data[[var_weight]]    
    
  m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
  
  summary(m)
}
my_func(dataset1, "mpg", "disp", "wt")

Passing a variable name to a function in R

I've recently discovered what I think is a better approach to passing variable names.

a <- data.frame(x = 1:10, y = 1:10)

b <- function(df, name){
    eval(substitute(name), df)
}

b(a, x)
  [1]  1  2  3  4  5  6  7  8  9 10

Update The approach uses non standard evaluation. I began explaining but quickly realized that Hadley Wickham does it much better than I could. Read this http://adv-r.had.co.nz/Computing-on-the-language.html

How do I manipulate variable names in R functions?

get does not work in your case because the object df_a$A1 does not exist.
df_a exists and has a column A1 but this is not how get works.

In your case you can use paste to only select the right column:

fun_a <- function(x) {
  sum(df_a[, paste0("A", x)])
}

> fun_a(1)
[1] 6
> fun_a(2)
[1] 8

In a more general setting you could use get to get the right dataframe and then use paste to select the right column of that dataframe.

Alternatively, you can use summarise_at from the dplyr package to do just that:

fun_a <- function(x) {
  as.numeric(dplyr::summarise_at(df_a, x, sum))
}

Passing variable name to a function in R

Use deparse/substitute to convert the unquoted argument to string and then use [[ to pull the column as a vector, create the logical vector and subset with [

myf.subset <- function(data, xvar) {
   xvar <- deparse(substitute(xvar))
 data[data[[xvar]] == 0, , drop = FALSE]
  }

-testing

> myf.subset(df, xvar = x)
   x
3  0
5  0
12 0
18 0
20 0
24 0
25 0
28 0
29 0
32 0
33 0
35 0
36 0
37 0
39 0
41 0
42 0
43 0
47 0
48 0
49 0
51 0
55 0
57 0
58 0
62 0
63 0
65 0
66 0
67 0
69 0
70 0
71 0
73 0
74 0
75 0
76 0
80 0
82 0
84 0
87 0
88 0
90 0
92 0
94 0
97 0
99 0

In the updated code, the formula can be created with reformulate or paste

myf.subset <- function(data, xvar, yvar, zvar) {
  xvar <- deparse(substitute(xvar))
  yvar <- deparse(substitute(yvar))
  zvar <- deparse(substitute(zvar))
  # new.data <- subset(data, xvar == 0)
  new.data <- data[data[[xvar]] == 0, , drop = FALSE]
  fmla <- reformulate(zvar, response = yvar)
  # fmla <- as.formula(paste(yvar, zvar, sep = ' ~ '))
  OLS <- lm(data = new.data, fmla )
  return(OLS)
}

-testing

> myf.subset(df, xvar = x, yvar = y, zvar = z)

Call:
lm(formula = fmla, data = new.data)

Coefficients:
(Intercept)            z  
    0.48000     -0.01333

How to get name of variable in R (substitute)?

I suggest you consider passing optional name value to these functions. I say this because it seems like you really want to use the name as a label for something in the end result; so it's not really the variable itself that matters so much as its name. You could do

fun1 <- function (some_variable, name=deparse(substitute(some_variable))) {
    name
}
fun2 <- function (var_pass, name=deparse(substitute(var_pass))) { 
    fun1 (var_pass, name) 
}
my_var <- c(1,2)

fun2(my_var)
# [1] "my_var"

fun1(my_var)
# [1] "my_var"

This way if you end up having some odd variable name and what to give a better name to a result, you at least have the option. And by default it should do what you want without having to require the name parameter.

How to Detect Free Variable Names in R Functions