How to detect free variable names in R functions
The codetools package has functions for this purpose, eg findGlobals
findGlobals(f, merge=FALSE)[['variables']]
# [1] "x"
if we redefine the function to have a named argument x
then no variables are returned.
f2 <- function(x){
x+1
}
findGlobals(f2, merge=FALSE)[['variables']]
# character(0)
How to read a list of variable names and substitute them in a function in R?
You can use as.formula
to convert text to valid R formulas. (But as you are discovering you cannot use R formulas as macros.) I also used just the varlist
items as the loop variables rather than sequential integers.
Dataset <- iris
varlist <- data.frame("Sepal.Length", "Sepal.Width", "Petal.Length")
for (i in varlist ){ form <- as.formula( paste( i, "~ Petal.Width"))
model <- lm(form, data = Dataset)
printing <- data.frame(i = model$coefficients['Petal.Width'])
write.csv(printing,paste0("~/Desktop/",i,"PetalWidthSlope.csv"))
}
I now have 3 more files on my Desktop:
R - accessing variable name inside a function
Doing it the way you do the variable names get lost. But as workaround you could name vector elements before you call the function:
names(filesVector) <- c("file1", "file2")
Now you should be able to access these inside the function simply with names(filesVector)
or names(filesVector[1])
.
How do estimation commands find variable names in formulas in R?
The tricky thing is that R's lexical scoping searches in enclosing environments,
which can be confusing during calls because the caller environments can each have enclosing environments and things get confusing pretty quickly.
I'll be using the rlang
package to debug this scenario.
First, if you defined expo
in the global environment,
then that will be its enclosing environment:
expo <- function(x, theta) {
x*wt^theta
}
rlang::get_env(expo)
# <environment: R_GlobalEnv>
So when you call it, R will first search for variables in the function's call
(not caller!)
environment,
and then in the enclosing environment (global environment here).
I don't know what nls
does exactly,
but I would have assumed that it creates an environment from the data
you provide and evaluates the formula there.
However, it seems the environment it creates only contains the variables it can explicitly see in the formula,
something I found with:
expo <- function(x, theta) {
cat("caller: ")
print(ls(rlang::caller_env()))
cat("enclosing: ")
print(ls(rlang::env_parent(rlang::current_env())))
}
nls(mpg ~ phi + expo(qsec, theta),
data = mtcars,
start = c('phi' = -2, 'theta' = 1))
# caller: [1] "mpg" "phi" "qsec" "theta"
# enclosing: [1] "expo"
# Error ...
As we can see, the caller environment of expo
contains the variables we can identify in the formula,
and its enclosing environment only contains the definition of expo
(the global environment).
This unfortunately means that you can't even use something like eval.parent
inside expo
,
because that environment won't have all variables from data
.
If you still want to work around it,
you could modify expo
's enclosing environment with your data before calling nls
,
something like:
expo <- function(x, theta) {
x*wt^theta
}
environment(expo) <- list2env(as.list(mtcars))
nls(mpg ~ phi + expo(qsec, theta),
data = mtcars,
start = c('phi' = -2, 'theta' = 1))
# Error ... number of iterations exceeded maximum of 50
Providing data and variable names in a function in R
This seems like a very unusual way to write an R function, but you could do
my_func <- function(data, var_mileage, var_volume, var_weight){
eval(substitute({
var_mileage_km_l <- 0.43 * var_mileage
var_volume_l <- 0.016 * var_volume
var_weight_kg <- 0.45 * var_weight
m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
summary(m)
}), envir = data)
}
The substitute()
injects the symbols you pass as the column names into the expression. Then you can evaluate it in the context of the data.frame.
Alternatively you could do something like
my_func <- function(data, var_mileage, var_volume, var_weight){
var_mileage <- eval(substitute(var_mileage), data)
var_volume <- eval(substitute(var_volume), data)
var_weight <- eval(substitute(var_weight), data)
var_mileage_km_l <- 0.43 * var_mileage
var_volume_l <- 0.016 * var_volume
var_weight_kg <- 0.45 * var_weight
m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
summary(m)
}
Or one other common trick is to turn the column names as strings.
my_func <- function(data, var_mileage, var_volume, var_weight){
var_mileage_km_l <- 0.43 * data[[var_mileage]]
var_volume_l <- 0.016 * data[[var_volume]]
var_weight_kg <- 0.45 * data[[var_weight]]
m <- lm(var_mileage_km_l ~ var_volume_l + var_weight_kg)
summary(m)
}
my_func(dataset1, "mpg", "disp", "wt")
Passing a variable name to a function in R
I've recently discovered what I think is a better approach to passing variable names.
a <- data.frame(x = 1:10, y = 1:10)
b <- function(df, name){
eval(substitute(name), df)
}
b(a, x)
[1] 1 2 3 4 5 6 7 8 9 10
Update The approach uses non standard evaluation. I began explaining but quickly realized that Hadley Wickham does it much better than I could. Read this http://adv-r.had.co.nz/Computing-on-the-language.html
How do I manipulate variable names in R functions?
get
does not work in your case because the object df_a$A1
does not exist. df_a
exists and has a column A1
but this is not how get
works.
In your case you can use paste
to only select the right column:
fun_a <- function(x) {
sum(df_a[, paste0("A", x)])
}
> fun_a(1)
[1] 6
> fun_a(2)
[1] 8
In a more general setting you could use get
to get the right dataframe and then use paste to select the right column of that dataframe.
Alternatively, you can use summarise_at
from the dplyr
package to do just that:
fun_a <- function(x) {
as.numeric(dplyr::summarise_at(df_a, x, sum))
}
Passing variable name to a function in R
Use deparse/substitute
to convert the unquoted argument to string and then use [[
to pull the column as a vector, create the logical vector and subset with [
myf.subset <- function(data, xvar) {
xvar <- deparse(substitute(xvar))
data[data[[xvar]] == 0, , drop = FALSE]
}
-testing
> myf.subset(df, xvar = x)
x
3 0
5 0
12 0
18 0
20 0
24 0
25 0
28 0
29 0
32 0
33 0
35 0
36 0
37 0
39 0
41 0
42 0
43 0
47 0
48 0
49 0
51 0
55 0
57 0
58 0
62 0
63 0
65 0
66 0
67 0
69 0
70 0
71 0
73 0
74 0
75 0
76 0
80 0
82 0
84 0
87 0
88 0
90 0
92 0
94 0
97 0
99 0
In the updated code, the formula can be created with reformulate
or paste
myf.subset <- function(data, xvar, yvar, zvar) {
xvar <- deparse(substitute(xvar))
yvar <- deparse(substitute(yvar))
zvar <- deparse(substitute(zvar))
# new.data <- subset(data, xvar == 0)
new.data <- data[data[[xvar]] == 0, , drop = FALSE]
fmla <- reformulate(zvar, response = yvar)
# fmla <- as.formula(paste(yvar, zvar, sep = ' ~ '))
OLS <- lm(data = new.data, fmla )
return(OLS)
}
-testing
> myf.subset(df, xvar = x, yvar = y, zvar = z)
Call:
lm(formula = fmla, data = new.data)
Coefficients:
(Intercept) z
0.48000 -0.01333
How to get name of variable in R (substitute)?
I suggest you consider passing optional name value to these functions. I say this because it seems like you really want to use the name as a label for something in the end result; so it's not really the variable itself that matters so much as its name. You could do
fun1 <- function (some_variable, name=deparse(substitute(some_variable))) {
name
}
fun2 <- function (var_pass, name=deparse(substitute(var_pass))) {
fun1 (var_pass, name)
}
my_var <- c(1,2)
fun2(my_var)
# [1] "my_var"
fun1(my_var)
# [1] "my_var"
This way if you end up having some odd variable name and what to give a better name to a result, you at least have the option. And by default it should do what you want without having to require the name parameter.
Related Topics
How to Use Empty Space Produced by Facet_Wrap
Plot Background Colour in Gradient
Operations on Multiple Tables/Datasets with Edit Queries and R in Power Bi
Messy Plot When Plotting Predictions of a Polynomial Regression Using Lm() in R
How to Deal with Hdf5 Files in R
Reproduce Table and Plot from Journal
Are There Global Variables in R Shiny
Check If a Date Is Within an Interval in R
Dealing with Very Small Numbers in R
Ggplot2: Drop Unused Factors in a Faceted Bar Plot But Not Have Differing Bar Widths Between Facets
R Dplyr Rowwise Mean or Min and Other Methods
How to Add a Table to a Ggplot
Datalabels in R Highcharter Cannot Be Seen After Print as Png or Jpg
Indicating the Statistically Significant Difference in Bar Graph Using R