Using R's lm on a dataframe with a list of predictors
Using the formula notation y ~ .
specifies that you want to regress y on all of the other variables in the dataset.
df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
# fits a model using x1 and x2
fit <- lm(y ~ ., data = df)
# Removes the column containing x1 so regression on x2 only
fit <- lm(y ~ ., data = df[, -2])
Incorporate all columns of a dataframe into one regression
Instead of attach
use. Here .
signifies all the other columns
lm(y ~ ., data = data)
e.g. a reproducible example with mtcars
lm(mpg ~ ., data = mtcars)
Or another option is reformulate
to construct the formula
lm(reformulate('.', response = 'mpg'), data = mtcars)
prediction using linear model and the importance of data.frame
When you call predict
on a lm
object, the function called is predict.lm. When you run it like:
predict(model_1, Sepal.Width=c(1,3,4,5))
What you are doing is providing c(1,3,4,5)
an argument or parameter to Sepal.Width
, which predict.lm
ignores since this argument does not exist for this function.
When there is no new input data, you are running predict.lm(model_1)
, and getting back the fitted values:
table(predict(model_1) == predict(model_1, Sepal.Width=c(1,3,4,5)))
TRUE
150
In this case, you fitted the model with a formula, the predict.lm
function needs your data frame to reconstruct the independent or exogenous matrix, matrix multiply with the coefficients and return you the predicted values.
This is briefly what predict.lm
is doing:
newdata = data.frame(Sepal.Width=c(1,3,4,5))
Terms = delete.response(terms(model_1))
X = model.matrix(Terms,newdata)
X
(Intercept) Sepal.Width
1 1 1
2 1 3
3 1 4
4 1 5
X %*% coefficients(model_1)
[,1]
1 6.302861
2 5.856139
3 5.632778
4 5.409417
predict(model_1,newdata)
1 2 3 4
6.302861 5.856139 5.632778 5.409417
How to succinctly write a formula with many variables from a data frame?
There is a special identifier that one can use in a formula to mean all the variables, it is the .
identifier.
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
You can also do things like this, to use all variables but one (in this case x3 is excluded):
mod <- lm(y ~ . - x3, data = d)
Technically, .
means all variables not already mentioned in the formula. For example
lm(y ~ x1 * x2 + ., data = d)
where .
would only reference x3
as x1
and x2
are already in the formula.
how to run lm regression for every column in R
Your code looks fine except when you call i
within lm
, R will read i
as a string, which you can't regress things against. Using get
will allow you to pull the column corresponding to i
.
df=data.frame(x=rnorm(100),y1=rnorm(100),y2=rnorm(100),y3=rnorm(100))
storage <- list()
for(i in names(df)[-1]){
storage[[i]] <- lm(get(i) ~ x, df)
}
I create an empty list storage
, which I'm going to fill up with each iteration of the loop. It's just a personal preference but I'd also advise against how you've written your current loop:
for(i in names(df[,-1])){
model = lm(i~x, data=df)
}
You will overwrite model
, thus returning only the last iteration results. I suggest you change it to a list, or a matrix where you can iteratively store results.
Hope that helps
Related Topics
Caching the Mean of a Vector in R
Using Lapply and Read.CSV on Multiple Files (In R)
What's the Difference in Using a Semicolon or Explicit New Line in R Code
Meaning of Objects Being Masked by the Global Environment
Using a Static (Prebuilt) PDF Vignette in R Package
Setting the Color for an Individual Data Point
Control Number of Decimal Places on Xtable Output in R
Error in Install.Packages:Cannot Remove Prior Installation of Package 'Dbi'
Hide Certain Columns in a Responsive Data Table Using Dt Package
How to Add a Prefix to Several Variable Names Using Dplyr
How to Append Data from a Data Frame in R to an Excel Sheet That Already Exists
Intersect All Possible Combinations of List Elements
Plotting Normal Curve Over Histogram Using Ggplot2: Code Produces Straight Line at 0
In R, Use Lubridate to Convert Hms Objects into Seconds