Shortcut Using Lm() in R for Formula

Shortcut using lm() in R for formula

Try lm(y ~ ., data) where . means "every other column in data besides y.

m <- matrix(rnorm(100), ncol =5)
m <- as.data.frame(m)
names(m) <- paste("m", 1:5, sep="")
lm(m1 ~., data=m)

You can reassign m to include only the columns you as the predictors

m <- m[ ,2:4]
lm(m1 ~ ., data=m)

Is there a short cut to typing multiple explanatory variables in lm() in R?

You could use the dot sign to select all variables, and just use the minus sign to select those that should not be used as predictors.

lm(Sepal.Length ~ .-Species -Petal.Length, iris)

Call:
lm(formula = Sepal.Length ~ . - Species - Petal.Length, data = iris)

Coefficients:
(Intercept) Sepal.Width Petal.Width
3.4573 0.3991 0.9721

Is there a shortcut to typing each reserved_list[[i]] into an lm function in R?

If you want to use all of the elements of res_list (other than y, if res_list has an element named y), then @RitchieSacramento's suggestion

lm(y ~ ., data = res_list)

should work. The semantics of . are documented in ?formula.

Otherwise, you can always build your formula programmatically:

f <- function(formula, index) {
n <- length(formula)
rhs <- formula[[n]]
l <- lapply(index, function(i) bquote(.(rhs)[[.(i)]]))
plus <- function(x, y) call("+", x, y)
formula[[n]] <- Reduce(plus, l)
formula
}

f(y ~ res_list, 1:10)
y ~ res_list[[1L]] + res_list[[2L]] + res_list[[3L]] + res_list[[4L]] + 
res_list[[5L]] + res_list[[6L]] + res_list[[7L]] + res_list[[8L]] +
res_list[[9L]] + res_list[[10L]]
f(hello ~ world, c(1L, 2L, 3L, 5L, 8L))
hello ~ world[[1L]] + world[[2L]] + world[[3L]] + world[[5L]] + 
world[[8L]]

lm() Regression with interactions for an entire dataframe

For both you could use the ^ operator.

See the example:

In your first case you just need the pair-wise interactions (2-way interactions). So you could do:

#Example df
df <- data.frame(a=runif(1:100), b=runif(1:100), c=runif(1:100), d=runif(1:100))

> lm(a ~ (b+c+d)^2, data=df)

Call:
lm(formula = a ~ (b + c + d)^2, data = df)

Coefficients:
(Intercept) b c d b:c b:d c:d
0.53873 0.23531 0.07813 -0.14763 -0.43130 0.11084 0.13181

As you can see the above produced the pair-wise interactions

Now in order to include all the interactions you can do:

> lm(a ~ (b+c+d)^5 , data=df)

Call:
lm(formula = a ~ (b + c + d)^5, data = df)

Coefficients:
(Intercept) b c d b:c b:d c:d b:c:d
0.54059 0.23123 0.07455 -0.15150 -0.42340 0.11926 0.14017 -0.01803

In this case you just need to use a number greater than the number of variables you will use (in this case I use 5 but it could be anything greater than 3). As you see all the interactions are produced.

How to succinctly write a formula with many variables from a data frame?

There is a special identifier that one can use in a formula to mean all the variables, it is the . identifier.

y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)

You can also do things like this, to use all variables but one (in this case x3 is excluded):

mod <- lm(y ~ . - x3, data = d)

Technically, . means all variables not already mentioned in the formula. For example

lm(y ~ x1 * x2 + ., data = d)

where . would only reference x3 as x1 and x2 are already in the formula.

short formula call for many variables when building a model

You can use . as described in the help page for formula. The . stands for "all columns not otherwise in the formula".

lm(output ~ ., data = myData).

Alternatively, construct the formula manually with paste. This example is from the as.formula() help page:

xnam <- paste("x", 1:25, sep="")
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))

You can then insert this object into regression function: lm(fmla, data = myData).

Using R's lm on a dataframe with a list of predictors

Using the formula notation y ~ . specifies that you want to regress y on all of the other variables in the dataset.

df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
# fits a model using x1 and x2
fit <- lm(y ~ ., data = df)
# Removes the column containing x1 so regression on x2 only
fit <- lm(y ~ ., data = df[, -2])

What does / mean in R when writing a regression formula in lm()

lm(y ~ x/z, data) is just a shortcut for lm(y ~ x + x:z, data)

These two give the same results:

lm(mpg ~ disp/hp,data = mtcars)

Call:
lm(formula = mpg ~ disp/hp, data = df)

Coefficients:
(Intercept) disp disp:hp
2.932e+01 -3.751e-02 -1.433e-05

lm(mpg ~ disp + disp:hp, data = mtcars)

Call:
lm(formula = mpg ~ disp + disp:hp, data = mtcars)

Coefficients:
(Intercept) disp disp:hp
2.932e+01 -3.751e-02 -1.433e-05

So, what your doing is modelling mpg based on disp alone and on an interaction between disp and hp.



Related Topics



Leave a reply



Submit