Showing string in formula and not as variable in lm fit
How about eval(call("lm", sformula))
?
lm(sformula)
#Call:
#lm(formula = sformula)
eval(call("lm", sformula))
#Call:
#lm(formula = "y~x")
Generally speaking there is a data
argument for lm
. Let's do:
mydata <- data.frame(y = y, x = x)
eval(call("lm", sformula, quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)
The above call()
+ eval()
combination can be replaced by do.call()
:
do.call("lm", list(formula = sformula))
#Call:
#lm(formula = "y~x")
do.call("lm", list(formula = sformula, data = quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)
Use string of independent variables within the lm function
The following will all produce the same results. I am providing multiple methods because there is are simpler ways of doing what you are asking (see examples 2 and 3) instead of writing the expression as a string.
First, I will generate some example data:
n <- 100
p <- 11
dat <- array(rnorm(n*p),c(n,p))
dat <- as.data.frame(dat)
colnames(dat) <- paste0("X",1:p)
If you really want to specify the model as a string, this example code will help:
ExVar <- toString(paste(names(dat[2:11]), "+ ", collapse = ''))
ExVar <- substr(ExVar, 1, nchar(ExVar)-3)
model1 <- paste("X1 ~ ",ExVar)
fit1 <- lm(eval(parse(text = model1)),data = dat)
Otherwise, note that the 'dot' notation will specify all other variables in the model as predictors.
fit2 <- lm(X1 ~ ., data = dat)
Or, you can select the predictors and outcome variables by column, if your data is structured as a matrix.
dat <- as.matrix(dat)
fit3 <- lm(dat[,1] ~ dat[,-1])
All three of these fit objects have the same estimates:
fit1
fit2
fit3
Problem with character string input for lm () in a loop
You can specify string variable names using as.formula
, and pass this to lm
.
x1 <- "var1"
x2 <- "var2"
y <- "var3"
fm <- as.formula(paste(y, "~", x1, "+", x2, sep=""))
lm(fm, data = dat)
How to use reference variables by character string in a formula?
I see a couple issues going on here. First, and I don't think this is causing any trouble, but let's make your data frame in one step so you don't have v1
through v4
floating around both in the global environment as well as in the data frame. Second, let's just make v2
a factor here so that we won't have to deal with making it a factor later.
dat <- data.frame(v1 = rnorm(10),
v2 = factor(sample(c(0,1), 10, replace=TRUE)),
v3 = rnorm(10),
v4 = rnorm(10) )
Part One Now, for your first part, it looks like this is what you want:
lm(v1 ~ v2 + v3 + v4, data=dat)
Here's a simpler way to do that, though you still have to specify the response variable.
lm(v1 ~ ., data=dat)
Alternatively, you certainly can build up the function with paste and call lm
on it.
f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)
However, my preference in these situations is to use do.call
, which evaluates expressions before passing them to the function; this makes the resulting object more suitable for calling functions like update
on. Compare the call
part of the output.
do.call("lm", list(as.formula(f), data=as.name("dat")))
Part Two About your second part, it looks like this is what you're going for:
lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)
First, because v2
is a factor in the data frame, we don't need that part, and secondly, this can be simplified further by better using R's methods for using arithmetical operations to create interactions, like this.
lm(v1 ~ v2*(v3 + v4), data=dat)
I'd then simply create the function using paste
; the loop with assign
, even in the larger case, is probably not a good idea.
f <- paste(names(dat)[1], "~", names(dat)[2], "* (",
paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"
It can then be called using either lm
directly or with do.call
.
lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))
About your code The problem you had with trying to use r3
etc was that you wanted the contents of the variable r3
, not the value r3
. To get the value, you need get
, like this, and then you'd collapse the values together with paste
.
vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")
However, a better way would be to avoid assign
and just build a vector of the terms you want, like this.
vars <- NULL
for (v in 3:4) {
vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2],
colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")
A more R-like solution would be to use lapply
:
vars <- unlist(lapply(colnames(dat)[3:4],
function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))
How to dynamically name variables in formula in lm() function?
The core issue to understand here is that lm()
takes a type formula
as the first parameter that specifies the regression.
You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.
To simplify your example, start with:
y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))
df <- as.data.frame(cbind(y1,x1,x2,x3))
predictors = c("x1", "x2", "x3")
Now you can dynamically create a formula as as concatenated string (paste0
) and convert it to a formula. Then pass this formula to your lm()
call:
form1 = as.formula(paste0("y1~", predictors[1]))
lm(form1, data = df)
As akrun pointed out, you can then start doing things like create loops to dynamically generate these.
You can also do things like:
my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))
## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)
See also: Formula with dynamic number of variables
One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate
. From ?reformulate
:
reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.
Pass dynamically variable names in lm formula inside a function
First off, it's always difficult to help without a reproducible code example. For future posts I recommend familiarising yourself with how to provide such a minimal reproducible example.
I'm not entirely clear on what you're asking, so I assume this is about how to create a function that fits a simple linear model based on data
with a single user-chosen predictor var
.
Here is an example based on mtcars
results_LM <- function(data, var) {
lm(data[, 1] ~ data[, var])
}
results_LM(mtcars, "disp")
#Call:
#lm(formula = data[, 1] ~ data[, var])
#
#Coefficients:
#(Intercept) data[, var]
# 29.59985 -0.04122
You can confirm that this gives the same result as
lm(mpg ~ disp, data = mtcars)
Or perhaps you're asking how to carry through the column names for the predictor? In that case we can use as.formula
to construct a formula that we use together with the data
argument in lm
.
results_LM <- function(data, var) {
fm <- as.formula(paste(colnames(data)[1], "~", var))
lm(fm, data = data)
}
fit <- results_LM(mtcars, "disp")
fit
#Call:
#lm(formula = fm, data = data)
#
#Coefficients:
#(Intercept) disp
# 29.59985 -0.04122
names(fit$model)
#[1] "mpg" "disp"
Passing data-variables to R formulas
Wrap the formula in "expr," then evaluate it.
library(dplyr)
lm_tidy <- function(df, x, y) {
x <- sym(x)
y <- sym(y)
fm <- expr(!!y ~ !!x)
lm(fm, data = df)
}
This function is equivalent:
lm_tidy <- function(df, x, y) {
fm <- expr(!!sym(y) ~ !!sym(x))
lm(fm, data = df)
}
Then
lm_tidy(mtcars, "cyl", "mpg")
gives
Call:
lm(formula = fm, data = .)
Coefficients:
(Intercept) cyl
37.885 -2.876
EDIT per comment below:
library(rlang)
lm_tidy_quo <- function(df, x, y){
y <- enquo(y)
x <- enquo(x)
fm <- paste(quo_text(y), "~", quo_text(x))
lm(fm, data = df)
}
You can then pass symbols as arguments
lm_tidy_quo(mtcars, cyl, mpg)
How to succinctly write a formula with many variables from a data frame?
There is a special identifier that one can use in a formula to mean all the variables, it is the .
identifier.
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
You can also do things like this, to use all variables but one (in this case x3 is excluded):
mod <- lm(y ~ . - x3, data = d)
Technically, .
means all variables not already mentioned in the formula. For example
lm(y ~ x1 * x2 + ., data = d)
where .
would only reference x3
as x1
and x2
are already in the formula.
How to match a data frame of variable names and another with data for a regression?
x = data.frame(Var1= c("A", "B", "C", "D","E"),
Var2=c("F","G","H","I","J"),
Value= c(11, 12, 13, 14,18))
y = data.frame(A= c(11, 12, 13, 14,18),
B= c(15, 16, 17, 14,18),
C= c(17, 22, 23, 24,18),
D= c(11, 12, 13, 34,18),
E= c(11, 5, 13, 55,18),
F= c(8, 12, 13, 14,18),
G= c(7, 5, 13, 14,18),
H= c(8, 12, 13, 14,18),
I= c(9, 5, 13, 14,18),
J= c(11, 12, 13, 14,18))
We can use
fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
data = quote(y)))
modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))
modList[[1]] ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept) F
# 4.3500 0.7115
Remarks:
The use of
do.call
is to ensure thatreformulate
is evaluated when passed tolm
. This is desired as it allows functions likeupdate
to work correctly on the model object. See Showing string in formula and not as variable in lm fit. For a comparison:oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y),
as.character(x$Var2), as.character(x$Var1))
oo[[1]]
#Call:
#lm(formula = reformulate(RHS, LHS), data = y)
#
#Coefficients:
#(Intercept) F
# 4.3500 0.7115The
as.character
onx$Var1
andx$Var2
is necessary, as these two variables are currently "factor" variables not strings andreformulate
can't use them. If you putstringsAsFactors = FALSE
indata.frame
when you build yourx
, there is no such issue.
It works for you? It's not suppose to have a "for" loop?
The Map
function hides that "for" loop. It is a wrapper of the mapply
function. The *apply
family functions in R are a syntactic sugar.
Update on your revised question
Your original question is constructs a model formula as Var1 ~ Var2
.
Your new question wants Var1 ~ Var2 + Var3
.
x$Var3 <- rep("time", each=length(x$Var1))
y$time <- seq(1:length(y[,1]))
## collect multiple RHS variables (using concatenation function `c`)
RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
#str(RHS)
#List of 5 ## oh this list has names! annoying!!
# $ F: chr [1:2] "F" "time"
# $ G: chr [1:2] "G" "time"
# $ H: chr [1:2] "H" "time"
# $ I: chr [1:2] "I" "time"
# $ J: chr [1:2] "J" "time"
LHS <- as.character(x$Var1)
modList <- Map(fitmodel, RHS, LHS) ## `fitmodel` function unchanged
modList[[1]] ## for example
#Call:
#lm(formula = A ~ F + time, data = y)
#
#Coefficients:
#(Intercept) F time
# 5.6 0.5 0.5
Related Topics
Colorbar from Custom Colorramppalette
R Displays Numbers in Scientific Notation
Split a String by Any Number of Spaces
Show Names of Everything in a Package
Fast Levenshtein Distance in R
Reading Rdata File with Different Encoding
R: How to Filter/Subset a Sequence of Dates
How to Speed Up Subset by Groups
How to Add a Table to My Ggplot2 Output
What Does This Mean: Unable to Find an Inherited Method for Function 'A' for Signature '"B"'
Ggplot2: Adding Secondary Transformed X-Axis on Top of Plot