Showing String in Formula and Not as Variable in Lm Fit

Showing string in formula and not as variable in lm fit

How about eval(call("lm", sformula))?

lm(sformula)
#Call:
#lm(formula = sformula)

eval(call("lm", sformula))
#Call:
#lm(formula = "y~x")

Generally speaking there is a data argument for lm. Let's do:

mydata <- data.frame(y = y, x = x)
eval(call("lm", sformula, quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)

The above call() + eval() combination can be replaced by do.call():

do.call("lm", list(formula = sformula))
#Call:
#lm(formula = "y~x")

do.call("lm", list(formula = sformula, data = quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)

Use string of independent variables within the lm function

The following will all produce the same results. I am providing multiple methods because there is are simpler ways of doing what you are asking (see examples 2 and 3) instead of writing the expression as a string.

First, I will generate some example data:

n <- 100
p <- 11
dat <- array(rnorm(n*p),c(n,p))

dat <- as.data.frame(dat)
colnames(dat) <- paste0("X",1:p)

If you really want to specify the model as a string, this example code will help:

ExVar <- toString(paste(names(dat[2:11]), "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste("X1 ~ ",ExVar) 
fit1 <- lm(eval(parse(text = model1)),data = dat)

Otherwise, note that the 'dot' notation will specify all other variables in the model as predictors.

fit2 <- lm(X1 ~ ., data = dat)

Or, you can select the predictors and outcome variables by column, if your data is structured as a matrix.

dat <- as.matrix(dat)
fit3 <- lm(dat[,1] ~ dat[,-1])

All three of these fit objects have the same estimates:

fit1
fit2
fit3

Problem with character string input for lm () in a loop

You can specify string variable names using as.formula, and pass this to lm.

x1 <- "var1"
x2 <- "var2"
y <- "var3"

fm <- as.formula(paste(y, "~", x1, "+", x2, sep=""))

lm(fm, data = dat)

How to use reference variables by character string in a formula?

I see a couple issues going on here. First, and I don't think this is causing any trouble, but let's make your data frame in one step so you don't have v1 through v4 floating around both in the global environment as well as in the data frame. Second, let's just make v2 a factor here so that we won't have to deal with making it a factor later.

dat <- data.frame(v1 = rnorm(10),
                  v2 = factor(sample(c(0,1), 10, replace=TRUE)),
                  v3 = rnorm(10),
                  v4 = rnorm(10) )

Part One Now, for your first part, it looks like this is what you want:

lm(v1 ~ v2 + v3 + v4, data=dat)

Here's a simpler way to do that, though you still have to specify the response variable.

lm(v1 ~ ., data=dat)

Alternatively, you certainly can build up the function with paste and call lm on it.

f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)

However, my preference in these situations is to use do.call, which evaluates expressions before passing them to the function; this makes the resulting object more suitable for calling functions like update on. Compare the call part of the output.

do.call("lm", list(as.formula(f), data=as.name("dat")))

Part Two About your second part, it looks like this is what you're going for:

lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)

First, because v2 is a factor in the data frame, we don't need that part, and secondly, this can be simplified further by better using R's methods for using arithmetical operations to create interactions, like this.

lm(v1 ~ v2*(v3 + v4), data=dat)

I'd then simply create the function using paste; the loop with assign, even in the larger case, is probably not a good idea.

f <- paste(names(dat)[1], "~", names(dat)[2], "* (", 
           paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"

It can then be called using either lm directly or with do.call.

lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))

About your code The problem you had with trying to use r3 etc was that you wanted the contents of the variable r3, not the value r3. To get the value, you need get, like this, and then you'd collapse the values together with paste.

vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")

However, a better way would be to avoid assign and just build a vector of the terms you want, like this.

vars <- NULL
for (v in 3:4) {
  vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2], 
                                          colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")

A more R-like solution would be to use lapply:

vars <- unlist(lapply(colnames(dat)[3:4], 
                      function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))

How to dynamically name variables in formula in lm() function?

The core issue to understand here is that lm() takes a type formula as the first parameter that specifies the regression.

You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.

To simplify your example, start with:

y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))

df <- as.data.frame(cbind(y1,x1,x2,x3))

predictors = c("x1", "x2", "x3")

Now you can dynamically create a formula as as concatenated string (paste0) and convert it to a formula. Then pass this formula to your lm() call:

form1 = as.formula(paste0("y1~", predictors[1]))

lm(form1, data = df)

As akrun pointed out, you can then start doing things like create loops to dynamically generate these.

You can also do things like:

my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))

## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)

See also: Formula with dynamic number of variables

One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate. From ?reformulate:

reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.

Pass dynamically variable names in lm formula inside a function

First off, it's always difficult to help without a reproducible code example. For future posts I recommend familiarising yourself with how to provide such a minimal reproducible example.

I'm not entirely clear on what you're asking, so I assume this is about how to create a function that fits a simple linear model based on data with a single user-chosen predictor var.

Here is an example based on mtcars

results_LM <- function(data, var) {
    lm(data[, 1] ~ data[, var])
}

results_LM(mtcars, "disp")
#Call:
#lm(formula = data[, 1] ~ data[, var])
#
#Coefficients:
#(Intercept)  data[, var]
#   29.59985     -0.04122

You can confirm that this gives the same result as

lm(mpg ~ disp, data = mtcars)

Or perhaps you're asking how to carry through the column names for the predictor? In that case we can use as.formula to construct a formula that we use together with the data argument in lm.

results_LM <- function(data, var) {
    fm <- as.formula(paste(colnames(data)[1], "~", var))
    lm(fm, data = data)
}

fit <- results_LM(mtcars, "disp")
fit
#Call:
#lm(formula = fm, data = data)
#
#Coefficients:
#(Intercept)         disp
#   29.59985     -0.04122

names(fit$model)
#[1] "mpg"  "disp"

Passing data-variables to R formulas

Wrap the formula in "expr," then evaluate it.

library(dplyr)
lm_tidy <- function(df, x, y) {
  x <- sym(x)
  y <- sym(y)
  fm <- expr(!!y ~ !!x)
  lm(fm, data = df)
}

This function is equivalent:

lm_tidy <- function(df, x, y) {
  fm <- expr(!!sym(y) ~ !!sym(x))
  lm(fm, data = df)
}

Then

lm_tidy(mtcars, "cyl", "mpg")

gives

Call:
lm(formula = fm, data = .)

Coefficients:
(Intercept)          cyl  
     37.885       -2.876

EDIT per comment below:

library(rlang)
lm_tidy_quo <- function(df, x, y){
    y <- enquo(y)
    x <- enquo(x)
    fm <- paste(quo_text(y), "~", quo_text(x))
    lm(fm, data = df)
}

You can then pass symbols as arguments

lm_tidy_quo(mtcars, cyl, mpg)

How to succinctly write a formula with many variables from a data frame?

There is a special identifier that one can use in a formula to mean all the variables, it is the . identifier.

y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)

You can also do things like this, to use all variables but one (in this case x3 is excluded):

mod <- lm(y ~ . - x3, data = d)

Technically, . means all variables not already mentioned in the formula. For example

lm(y ~ x1 * x2 + ., data = d)

where . would only reference x3 as x1 and x2 are already in the formula.

How to match a data frame of variable names and another with data for a regression?

x = data.frame(Var1= c("A", "B", "C", "D","E"),
               Var2=c("F","G","H","I","J"),
               Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18),
               B= c(15, 16, 17, 14,18),
               C= c(17, 22, 23, 24,18),
               D= c(11, 12, 13, 34,18),
               E= c(11, 5, 13, 55,18),
               F= c(8, 12, 13, 14,18),
               G= c(7, 5, 13, 14,18),
               H= c(8, 12, 13, 14,18), 
               I= c(9, 5, 13, 14,18),
               J= c(11, 12, 13, 14,18))

We can use

fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
                                              data = quote(y)))

modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))

modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept)            F  
#     4.3500       0.7115

Remarks:

The use of do.call is to ensure that reformulate is evaluated when passed to lm. This is desired as it allows functions like update to work correctly on the model object. See Showing string in formula and not as variable in lm fit. For a comparison:

oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y),
          as.character(x$Var2), as.character(x$Var1))
oo[[1]]
#Call:
#lm(formula = reformulate(RHS, LHS), data = y)
#
#Coefficients:
#(Intercept)            F  
#     4.3500       0.7115

The as.character on x$Var1 and x$Var2 is necessary, as these two variables are currently "factor" variables not strings and reformulate can't use them. If you put stringsAsFactors = FALSE in data.frame when you build your x, there is no such issue.

It works for you? It's not suppose to have a "for" loop?

The Map function hides that "for" loop. It is a wrapper of the mapply function. The *apply family functions in R are a syntactic sugar.

Update on your revised question

Your original question is constructs a model formula as Var1 ~ Var2.

Your new question wants Var1 ~ Var2 + Var3.

x$Var3 <- rep("time", each=length(x$Var1))
y$time <- seq(1:length(y[,1]))

## collect multiple RHS variables (using concatenation function `c`)
RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
#str(RHS)
#List of 5  ## oh this list has names! annoying!!
# $ F: chr [1:2] "F" "time"
# $ G: chr [1:2] "G" "time"
# $ H: chr [1:2] "H" "time"
# $ I: chr [1:2] "I" "time"
# $ J: chr [1:2] "J" "time"
LHS <- as.character(x$Var1)
modList <- Map(fitmodel, RHS, LHS)  ## `fitmodel` function unchanged
modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F + time, data = y)
#
#Coefficients:
#(Intercept)            F         time  
#        5.6          0.5          0.5

Showing String in Formula and Not as Variable in Lm Fit