Automatically create formulas for all possible linear models
Say we work with this ridiculous example :
DF <- data.frame(Class=1:10,A=1:10,B=1:10,C=1:10)
Then you get the names of the columns
Cols <- names(DF)
Cols <- Cols[! Cols %in% "Class"]
n <- length(Cols)
You construct all possible combinations
id <- unlist(
lapply(1:n,
function(i)combn(1:n,i,simplify=FALSE)
)
,recursive=FALSE)
You paste them to formulas
Formulas <- sapply(id,function(i)
paste("Class~",paste(Cols[i],collapse="+"))
)
And you loop over them to apply the models.
lapply(Formulas,function(i)
lm(as.formula(i),data=DF))
Be warned though: if you have more than a handful columns, this will quickly become very heavy on the memory and result in literally thousands of models. You have 2^n - 1 different models with n being the number of columns.
Make very sure that is what you want, in general this kind of model comparison is strongly advised against. Forget about any kind of inference as well when you do this.
automatically create a dynamic string with variable names in the form of lm() function to fit linear models in R
names(st) <- make.names(names(st))
y <- "Life.Exp"
x <- names(st)[!names(st) %in% y]
x
y
mymodel <- as.formula(paste(y, paste(x, collapse="+"), sep="~"))
lm(mymodel, data=st)
How to create many Linear Regression models via a For Loop in R?
Your approach wasn't so bad. This is how I reproduced your work as you described it:
library(rje) # provides the powerSet function
library(olsrr) # provides the ols_mallows_cp function to calculate the Mallow's Cp values
x <- powerSet(colnames(mtcars[,-1]))
full_model <- lm( mpg ~ ., data=mtcars )
your_models <- lapply( x, function(n) {
d_i <- mtcars[,c( "mpg", n), drop=FALSE] # use drop=FALSE to make sure it stays a 2d structure
return( lm( mpg ~ ., data = d_i ) )
})
Cp_vec <- sapply( your_models, function(m) {
ols_mallows_cp( m, full_model )
})
TenSmallestIndeces <- head( order( Cp_vec ), n=10 )
TenSmallestCp <- head( sort( Cp_vec ), n=10 )
TenSmallestSets <- x[ TenSmallestIndeces ]
## inspect one of your models:
your_models[[ TenSmallestIndeces[1] ]]
It's always preferable to use some sort of apply when collecting from a loop. I frequently use foreach from the foreach package also when building data frames or other 2d structures from a loop.
I create the subset just like you did, and fit the model pretty much the same way, just do it in one go.
Then you just need to understand sort() and order() proberly to look back up in the set you started out with I think.
How to run lm models using all possible combinations of several variables and a factor
I suspect the dredge
function in the MuMIn package would help you. You specify a "full" model with all parameters you want to include and then run dredge(fullmodel)
to get all combinations nested within the full model.
You should then be able to get the coefficients and AIC values from the results of this.
Something like:
require(MuMIn)
data(iris)
globalmodel <- lm(Sepal.Length ~ Petal.Length + Petal.Width + Species, data = iris)
combinations <- dredge(globalmodel)
print(combinations)
to get the parameter estimates for all models (a bit messy) you can then use
coefTable(combinations)
or to get the coefficients for a particular model you can index that using the row number in the dredge object, e.g.
coefTable(combinations)[1]
to get the coefficients in the model at row 1. This should also print coefficients for factor levels.
See the MuMIn helpfile for more details and ways to extract information.
Hope that helps.
How to succinctly write a formula with many variables from a data frame?
There is a special identifier that one can use in a formula to mean all the variables, it is the .
identifier.
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
You can also do things like this, to use all variables but one (in this case x3 is excluded):
mod <- lm(y ~ . - x3, data = d)
Technically, .
means all variables not already mentioned in the formula. For example
lm(y ~ x1 * x2 + ., data = d)
where .
would only reference x3
as x1
and x2
are already in the formula.
different possible combinations of variables for a generalized linear model
This is called dredging:
library(MuMIn)
data(Cement)
fm1 <- lm(y ~ ., data = Cement)
dd <- dredge(fm1)
Global model call: lm(formula = y ~ ., data = Cement)
---
Model selection table
(Intrc) X1 X2 X3 X4 df logLik AICc delta weight
4 52.58 1.468 0.6623 4 -28.156 69.3 0.00 0.566
12 71.65 1.452 0.4161 -0.2365 5 -26.933 72.4 3.13 0.119
8 48.19 1.696 0.6569 0.2500 5 -26.952 72.5 3.16 0.116
10 103.10 1.440 -0.6140 4 -29.817 72.6 3.32 0.107
14 111.70 1.052 -0.4100 -0.6428 5 -27.310 73.2 3.88 0.081
15 203.60 -0.9234 -1.4480 -1.5570 5 -29.734 78.0 8.73 0.007
16 62.41 1.551 0.5102 0.1019 -0.1441 6 -26.918 79.8 10.52 0.003
13 131.30 -1.2000 -0.7246 4 -35.372 83.7 14.43 0.000
7 72.07 0.7313 -1.0080 4 -40.965 94.9 25.62 0.000
9 117.60 -0.7382 3 -45.872 100.4 31.10 0.000
3 57.42 0.7891 3 -46.035 100.7 31.42 0.000
11 94.16 0.3109 -0.4569 4 -45.761 104.5 35.21 0.000
2 81.48 1.869 3 -48.206 105.1 35.77 0.000
6 72.35 2.312 0.4945 4 -48.005 109.0 39.70 0.000
5 110.20 -1.2560 3 -50.980 110.6 41.31 0.000
1 95.42 2 -53.168 111.5 42.22 0.000
Is there a way to loop through column names (not numbers) in r for linear models?
If you want the statistics in a table (which might come in handy) you can use the purrr
and broom
packages. Here's an example using the dataset mtcars
:
Code
library(tidyr)
library(purrr)
library(broom)
formula <- lapply(colnames(mtcars)[3:ncol(mtcars)], function(x) as.formula(paste0(x, " ~ cyl")))
names(formula) <- format(formula)
table <- formula %>% map(~aov(.x, mtcars)) %>% map_dfr(tidy, .id="model")
Output
> head(table)
# A tibble: 6 x 7
model term df sumsq meansq statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 disp ~ cyl cyl 1 387454. 387454. 131. 1.80e-12
2 disp ~ cyl Residuals 30 88731. 2958. NA NA
3 hp ~ cyl cyl 1 100984. 100984. 67.7 3.48e- 9
4 hp ~ cyl Residuals 30 44743. 1491. NA NA
5 drat ~ cyl cyl 1 4.34 4.34 28.8 8.24e- 6
6 drat ~ cyl Residuals 30 4.52 0.151 NA NA
Try
formula <- lapply(colnames(df)[10:ncol(df)], function(x) as.formula(paste0(x, " ~ block + tillage * residue + Error(subblock)")))
names(formula) <- format(formula)
table <- formula %>% map(~aov(.x, df)) %>% map_dfr(tidy, .id="model")
Related Topics
Create Columns from Factors and Count
Set Margin Size When Converting from Markdown to PDF with Pandoc
Save Plot with a Given Aspect Ratio
Plotting a 3D Surface Plot with Contour Map Overlay, Using R
How to Arrange an Arbitrary Number of Ggplots Using Grid.Arrange
Detach All Packages While Working in R
Differencebetween Parent.Frame() and Parent.Env() in R; How Do They Differ in Call by Reference
Promise Already Under Evaluation: Recursive Default Argument Reference or Earlier Problems
Techniques for Finding Near Duplicate Records
How to Delete Columns That Contain Only Nas
How to Generate Distributions Given, Mean, Sd, Skew and Kurtosis in R
Merge Data Frames Based on Rownames in R
Rstudio Not Picking the Encoding I'm Telling It to Use When Reading a File
Setting Function Defaults R on a Project Specific Basis
Is There a Built-In Way to Do a Logarithmic Color Scale in Ggplot2