Formula With Dynamic Number of Variables

Formula with dynamic number of variables

See ?as.formula, e.g.:

factors <- c("factor1", "factor2")
as.formula(paste("y~", paste(factors, collapse="+")))
# y ~ factor1 + factor2

where factors is a character vector containing the names of the factors you want to use in the model. This you can paste into an lm model, e.g.:

set.seed(0)
y <- rnorm(100)
factor1 <- rep(1:2, each=50)
factor2 <- rep(3:4, 50)
lm(as.formula(paste("y~", paste(factors, collapse="+"))))

# Call:
# lm(formula = as.formula(paste("y~", paste(factors, collapse = "+"))))

# Coefficients:
# (Intercept) factor1 factor2
# 0.542471 -0.002525 -0.147433

How to dynamically name variables in formula in lm() function?

The core issue to understand here is that lm() takes a type formula as the first parameter that specifies the regression.

You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.

To simplify your example, start with:

y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))

df <- as.data.frame(cbind(y1,x1,x2,x3))

predictors = c("x1", "x2", "x3")

Now you can dynamically create a formula as as concatenated string (paste0) and convert it to a formula. Then pass this formula to your lm() call:

form1 = as.formula(paste0("y1~", predictors[1]))

lm(form1, data = df)

As akrun pointed out, you can then start doing things like create loops to dynamically generate these.

You can also do things like:

my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))

## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)

See also: Formula with dynamic number of variables

One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate. From ?reformulate:

reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.

Formula with dynamic variables in excel

I am assuming you want to use subsequent rows to record each withdrawal of some boxes? You need to enter the appropriate formula in each cell of the total column. So in B3, put =B2-A3. Then copy-paste that to all cells below in col B, Excel will paste a formula whose cell references are relative to the cell the formula is pasted to. Alternatively there is a mouse drag gesture that's even faster to fill a column with a formula.

Dynamic formula creation in R?

Yes, and in fact the formula interface has performance issues the larger the number of columns.
So in fact the matrix interface is preferred for large column widths.

Is there any way I can create the formula dynamically?

Sure, you look up the matrix columns either directly by an vector of column-indices, or indirectly by converting a vector of names into column-indices
using grep(cols_you_want, names(mat))

But in your case, you don't need to bother with grep since you already have a straightforward column-naming scheme, you know that ind1...ind5 corresponds to column-indices 1..5

lm(m1[,'dep'] ~ m1[,2:5])

# or in general
lm(m1[,'dep'] ~ m1[,colIndicesVector]) # e.g. c(1,3,4)

Dynamic number of X values (dependent variables) in glm function in R isn't giving the right output

In glm, the formula argument is a symbolic description of the model to be fitted and the data argument is an optional data frame containing the variables in the model.

In your logistic_regression function call of glm(), the model variables indicated in formula y~k1+k2 are not contained within data=x (a data frame with two columns named X0 and X1), and thus, are taken from the environment from which glm is called (your logistic_regression function). The 3 hardcoded vectors (m, k1, k2) in that environment are not associated with the inputs (i.e., the x=k1,k2 and y=m step done in your second scenario is not occurring within your function).

To call glm() using your logistic_regression() input, you could create a data frame consisting of the model variables to use as a single input and edit your function accordingly. For example, you could use:

x <- data.frame(y=c(1, 1, 1, 0, 0, 0), k1=c(4,3,5,1,2,3), k2= c(6,7,8,5,6,3))

logistic_regression <- function(x){
glm.out <- glm(as.formula(paste("y~", paste(colnames(x[,-1]), collapse="+"))), family=binomial(logit), data=x)
return(summary(glm.out))
}

logistic_regression(x)

Using ~ call in R with dynamic variables

You can try this, you need to pass a character to variable. It's much easier that way and if you have 10 variables on X side, you can easily iterate through them:

getFormula <- function(variable){
as.formula(paste(variable,"~ Sepal.Length + Sepal.Width + Species"))
}

petal.length.formula <- getFormula("Petal.Length")
petal.width.formula <- getFormula("Petal.Width")

lm(petal.length.formula,data=iris)
Call:
lm(formula = petal.length.formula, data = iris)

Coefficients:
(Intercept) Sepal.Length Sepal.Width Speciesversicolor
-1.63430 0.64631 -0.04058 2.17023
Speciesvirginica
3.04911

You can also try reformulate, as suggested by @BenBolker and @MrFlick:

getFormula <- function(variable){
reformulate(c("Sepal.Length","Sepal.Width","Species"),
response = variable, intercept = TRUE)
}

lm(getFormula("Petal.Length"),data=iris)

Call:
lm(formula = getFormula("Petal.Length"), data = iris)

Coefficients:
(Intercept) Sepal.Length Sepal.Width Speciesversicolor
-1.63430 0.64631 -0.04058 2.17023
Speciesvirginica
3.04911

R:fit dynamic number of explanatory variable into polynomial regression

One rather convoluted way is to create a formula for the lm regression call by pasting the terms together.

# some data
dat <- data.frame(replicate(10, rnorm(20)))

# Create formula - apply f function to all columns names excluding the first
form <- formula(paste(names(dat)[1], " ~ ",
paste0("poly(", names(dat)[-1], ", 2)", collapse="+")))
# run regression
lm(form , data=dat)


Related Topics



Leave a reply



Submit