Formula with dynamic number of variables
See ?as.formula
, e.g.:
factors <- c("factor1", "factor2")
as.formula(paste("y~", paste(factors, collapse="+")))
# y ~ factor1 + factor2
where factors
is a character vector containing the names of the factors you want to use in the model. This you can paste into an lm
model, e.g.:
set.seed(0)
y <- rnorm(100)
factor1 <- rep(1:2, each=50)
factor2 <- rep(3:4, 50)
lm(as.formula(paste("y~", paste(factors, collapse="+"))))
# Call:
# lm(formula = as.formula(paste("y~", paste(factors, collapse = "+"))))
# Coefficients:
# (Intercept) factor1 factor2
# 0.542471 -0.002525 -0.147433
How to dynamically name variables in formula in lm() function?
The core issue to understand here is that lm()
takes a type formula
as the first parameter that specifies the regression.
You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.
To simplify your example, start with:
y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))
df <- as.data.frame(cbind(y1,x1,x2,x3))
predictors = c("x1", "x2", "x3")
Now you can dynamically create a formula as as concatenated string (paste0
) and convert it to a formula. Then pass this formula to your lm()
call:
form1 = as.formula(paste0("y1~", predictors[1]))
lm(form1, data = df)
As akrun pointed out, you can then start doing things like create loops to dynamically generate these.
You can also do things like:
my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))
## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)
See also: Formula with dynamic number of variables
One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate
. From ?reformulate
:
reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.
Formula with dynamic variables in excel
I am assuming you want to use subsequent rows to record each withdrawal of some boxes? You need to enter the appropriate formula in each cell of the total column. So in B3, put =B2-A3. Then copy-paste that to all cells below in col B, Excel will paste a formula whose cell references are relative to the cell the formula is pasted to. Alternatively there is a mouse drag gesture that's even faster to fill a column with a formula.
Dynamic formula creation in R?
Yes, and in fact the formula interface has performance issues the larger the number of columns.
So in fact the matrix interface is preferred for large column widths.
Is there any way I can create the formula dynamically?
Sure, you look up the matrix columns either directly by an vector of column-indices, or indirectly by converting a vector of names into column-indices
using grep(cols_you_want, names(mat))
But in your case, you don't need to bother with grep since you already have a straightforward column-naming scheme, you know that ind1...ind5
corresponds to column-indices 1..5
lm(m1[,'dep'] ~ m1[,2:5])
# or in general
lm(m1[,'dep'] ~ m1[,colIndicesVector]) # e.g. c(1,3,4)
Dynamic number of X values (dependent variables) in glm function in R isn't giving the right output
In glm
, the formula
argument is a symbolic description of the model to be fitted and the data
argument is an optional data frame containing the variables in the model.
In your logistic_regression
function call of glm()
, the model variables indicated in formula y~k1+k2
are not contained within data=x
(a data frame with two columns named X0
and X1
), and thus, are taken from the environment from which glm is called (your logistic_regression
function). The 3 hardcoded vectors (m, k1, k2
) in that environment are not associated with the inputs (i.e., the x=k1,k2 and y=m
step done in your second scenario is not occurring within your function).
To call glm() using your logistic_regression()
input, you could create a data frame consisting of the model variables to use as a single input and edit your function accordingly. For example, you could use:
x <- data.frame(y=c(1, 1, 1, 0, 0, 0), k1=c(4,3,5,1,2,3), k2= c(6,7,8,5,6,3))
logistic_regression <- function(x){
glm.out <- glm(as.formula(paste("y~", paste(colnames(x[,-1]), collapse="+"))), family=binomial(logit), data=x)
return(summary(glm.out))
}
logistic_regression(x)
Using ~ call in R with dynamic variables
You can try this, you need to pass a character to variable. It's much easier that way and if you have 10 variables on X side, you can easily iterate through them:
getFormula <- function(variable){
as.formula(paste(variable,"~ Sepal.Length + Sepal.Width + Species"))
}
petal.length.formula <- getFormula("Petal.Length")
petal.width.formula <- getFormula("Petal.Width")
lm(petal.length.formula,data=iris)
Call:
lm(formula = petal.length.formula, data = iris)
Coefficients:
(Intercept) Sepal.Length Sepal.Width Speciesversicolor
-1.63430 0.64631 -0.04058 2.17023
Speciesvirginica
3.04911
You can also try reformulate, as suggested by @BenBolker and @MrFlick:
getFormula <- function(variable){
reformulate(c("Sepal.Length","Sepal.Width","Species"),
response = variable, intercept = TRUE)
}
lm(getFormula("Petal.Length"),data=iris)
Call:
lm(formula = getFormula("Petal.Length"), data = iris)
Coefficients:
(Intercept) Sepal.Length Sepal.Width Speciesversicolor
-1.63430 0.64631 -0.04058 2.17023
Speciesvirginica
3.04911
R:fit dynamic number of explanatory variable into polynomial regression
One rather convoluted way is to create a formula
for the lm
regression call by pasting the terms together.
# some data
dat <- data.frame(replicate(10, rnorm(20)))
# Create formula - apply f function to all columns names excluding the first
form <- formula(paste(names(dat)[1], " ~ ",
paste0("poly(", names(dat)[-1], ", 2)", collapse="+")))
# run regression
lm(form , data=dat)
Related Topics
Overlap Join With Start and End Positions
How to Prevent Ifelse() from Turning Date Objects into Numeric Objects
Is R'S Apply Family More Than Syntactic Sugar
Why Does Summarize or Mutate Not Work With Group_By When I Load 'Plyr' After 'Dplyr'
Split Column At Delimiter in Data Frame
Pass a Data.Frame Column Name to a Function
Extract Row Corresponding to Minimum Value of a Variable by Group
Unique Combination of All Elements from Two (Or More) Vectors
How to Read Data When Some Numbers Contain Commas as Thousand Separator
How to Use a Variable to Specify Column Name in Ggplot
Collapse Text by Group in Data Frame
Test If a Vector Contains a Given Element
Select Rows from a Data Frame Based on Values in a Vector
How to Convert Only Some Positive Numbers to Negative Numbers (Conditional Recoding)