Shortcut using lm() in R for formula
Try lm(y ~ ., data)
where .
means "every other column in data
besides y
.
m <- matrix(rnorm(100), ncol =5)
m <- as.data.frame(m)
names(m) <- paste("m", 1:5, sep="")
lm(m1 ~., data=m)
You can reassign m
to include only the columns you as the predictors
m <- m[ ,2:4]
lm(m1 ~ ., data=m)
Is there a short cut to typing multiple explanatory variables in lm() in R?
You could use the dot sign to select all variables, and just use the minus sign to select those that should not be used as predictors.
lm(Sepal.Length ~ .-Species -Petal.Length, iris)
Call:
lm(formula = Sepal.Length ~ . - Species - Petal.Length, data = iris)
Coefficients:
(Intercept) Sepal.Width Petal.Width
3.4573 0.3991 0.9721
Is there a shortcut to typing each reserved_list[[i]] into an lm function in R?
If you want to use all of the elements of res_list
(other than y
, if res_list
has an element named y
), then @RitchieSacramento's suggestion
lm(y ~ ., data = res_list)
should work. The semantics of .
are documented in ?formula
.
Otherwise, you can always build your formula programmatically:
f <- function(formula, index) {
n <- length(formula)
rhs <- formula[[n]]
l <- lapply(index, function(i) bquote(.(rhs)[[.(i)]]))
plus <- function(x, y) call("+", x, y)
formula[[n]] <- Reduce(plus, l)
formula
}
f(y ~ res_list, 1:10)
y ~ res_list[[1L]] + res_list[[2L]] + res_list[[3L]] + res_list[[4L]] +
res_list[[5L]] + res_list[[6L]] + res_list[[7L]] + res_list[[8L]] +
res_list[[9L]] + res_list[[10L]]
f(hello ~ world, c(1L, 2L, 3L, 5L, 8L))
hello ~ world[[1L]] + world[[2L]] + world[[3L]] + world[[5L]] +
world[[8L]]
lm() Regression with interactions for an entire dataframe
For both you could use the ^
operator.
See the example:
In your first case you just need the pair-wise interactions (2-way interactions). So you could do:
#Example df
df <- data.frame(a=runif(1:100), b=runif(1:100), c=runif(1:100), d=runif(1:100))
> lm(a ~ (b+c+d)^2, data=df)
Call:
lm(formula = a ~ (b + c + d)^2, data = df)
Coefficients:
(Intercept) b c d b:c b:d c:d
0.53873 0.23531 0.07813 -0.14763 -0.43130 0.11084 0.13181
As you can see the above produced the pair-wise interactions
Now in order to include all the interactions you can do:
> lm(a ~ (b+c+d)^5 , data=df)
Call:
lm(formula = a ~ (b + c + d)^5, data = df)
Coefficients:
(Intercept) b c d b:c b:d c:d b:c:d
0.54059 0.23123 0.07455 -0.15150 -0.42340 0.11926 0.14017 -0.01803
In this case you just need to use a number greater than the number of variables you will use (in this case I use 5 but it could be anything greater than 3). As you see all the interactions are produced.
How to succinctly write a formula with many variables from a data frame?
There is a special identifier that one can use in a formula to mean all the variables, it is the .
identifier.
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
You can also do things like this, to use all variables but one (in this case x3 is excluded):
mod <- lm(y ~ . - x3, data = d)
Technically, .
means all variables not already mentioned in the formula. For example
lm(y ~ x1 * x2 + ., data = d)
where .
would only reference x3
as x1
and x2
are already in the formula.
short formula call for many variables when building a model
You can use .
as described in the help page for formula
. The .
stands for "all columns not otherwise in the formula".
lm(output ~ ., data = myData)
.
Alternatively, construct the formula manually with paste
. This example is from the as.formula()
help page:
xnam <- paste("x", 1:25, sep="")
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))
You can then insert this object into regression function: lm(fmla, data = myData)
.
Using R's lm on a dataframe with a list of predictors
Using the formula notation y ~ .
specifies that you want to regress y on all of the other variables in the dataset.
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
# fits a model using x1 and x2
fit <- lm(y ~ ., data = df)
# Removes the column containing x1 so regression on x2 only
fit <- lm(y ~ ., data = df[, -2])
What does / mean in R when writing a regression formula in lm()
lm(y ~ x/z, data)
is just a shortcut for lm(y ~ x + x:z, data)
These two give the same results:
lm(mpg ~ disp/hp,data = mtcars)
Call:
lm(formula = mpg ~ disp/hp, data = df)
Coefficients:
(Intercept) disp disp:hp
2.932e+01 -3.751e-02 -1.433e-05
lm(mpg ~ disp + disp:hp, data = mtcars)
Call:
lm(formula = mpg ~ disp + disp:hp, data = mtcars)
Coefficients:
(Intercept) disp disp:hp
2.932e+01 -3.751e-02 -1.433e-05
So, what your doing is modelling mpg
based on disp
alone and on an interaction between disp
and hp
.
Related Topics
How to Ignore Na in Ifelse Statement
How to Define Fill Colours in Ggplot Histogram
Draw Lines Between Different Elements in a Stacked Bar Plot
How to Add Se Error Bars to My Barplot in Ggplot2
Renaming and Hiding an Exported Rcpp Function in an R Package
Additional Metrics in Caret - Ppv, Sensitivity, Specificity
Ggplot Bar Plot Side by Side Using Two Variables
How Is J() Function Implemented in Data.Table
Automated Formula Construction
Creating New Shape Palettes in Ggplot2 and Other R Graphics
How to Rbind All the Data.Frames in Your Working Environment
Cannot Read File with "#" and Space Using Read.Table or Read.CSV in R
Plot a Jpg Image Using Base Graphics in R
How to Remove Leading "0." in a Numeric R Variable
Rbindlist Data.Tables with Different Number of Columns
Filling in a New Column Based on a Condition in a Data Frame