Fit a No-Intercept Model in Caret

Fit a no-intercept model in caret

As discussed in a linked SO question https://stackoverflow.com/a/41731117/7613376, this works in caret v6.0.76 (And the trace answer above no longer seems to work with code refactoring in caret):

caret_lmFit <- train(Sepal.Length~0+Petal.Length+Petal.Width, data=iris, "lm", 
tuneGrid = expand.grid(intercept = FALSE))

> caret_lmFit$finalModel

Call:
lm(formula = .outcome ~ 0 + ., data = dat)

Coefficients:
Petal.Length Petal.Width
2.856 -4.479

Logistic Regression in Caret - No Intercept?

There's a vignette on how to set up a custom model for caret. So in the solution below, you can also see why the intercept persist:

library(caret)
glm_wo_intercept = getModelInfo("glm",regex=FALSE)[[1]]

if you look at the fit, there's a line that does:

glm_wo_intercept$fit

....
modelArgs <- c(list(formula = as.formula(".outcome ~ ."), data = dat), theDots)
...

So the intercept is there by default. You can change this line and run caret on this modified model:

glm_wo_intercept$fit = function(x, y, wts, param, lev, last, classProbs, ...) {
dat <- if(is.data.frame(x)) x else as.data.frame(x)
dat$.outcome <- y
if(length(levels(y)) > 2) stop("glm models can only use 2-class outcomes")

theDots <- list(...)
if(!any(names(theDots) == "family"))
{
theDots$family <- if(is.factor(y)) binomial() else gaussian()
}
if(!is.null(wts)) theDots$weights <- wts
# change the model here
modelArgs <- c(list(formula = as.formula(".outcome ~ 0+."), data = dat), theDots)

out <- do.call("glm", modelArgs)
out$call <- NULL
out
}

We fit the model:

data = data.frame(y=factor(runif(100)>0.5),x=rnorm(100))
model <- train(y ~ 0+ x, data = data, method = glm_wo_intercept,
family = binomial(),trControl = trainControl(method = "cv",number=3))

predict(model,data.frame(x=0),type="prob")
FALSE TRUE
1 0.5 0.5

How to fit a model without an intercept using R tidymodels workflow?

You can use the formula argument to add_model() to override the terms of the model. This is typically used for survival and Bayesian models, so be extra careful that you know what you are doing here, because you are circumventing some of the guardrails of tidymodels by doing this:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip

mod <- linear_reg()
rec <- recipe(mpg ~ cyl + wt, data = mtcars)

workflow() %>%
add_recipe(rec) %>%
add_model(mod, formula = mpg ~ 0 + cyl + wt) %>%
fit(mtcars)
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#>
#> Call:
#> stats::lm(formula = mpg ~ 0 + cyl + wt, data = data)
#>
#> Coefficients:
#> cyl wt
#> 2.187 1.174

Created on 2021-09-01 by the reprex package (v2.0.1)

Using linear regression (lm) in R caret, how do I force the intercept through 0?

You can take advantage of the tuneGrid parameter in caret::train.

regressControl  <- trainControl(method="repeatedcv",
number = 4,
repeats = 5
)

regress <- train(mpg ~ hp,
data = mtcars,
method = "lm",
trControl = regressControl,
tuneGrid = expand.grid(intercept = FALSE))

Use getModelInfo("lm", regex = TRUE)[[1]]$param to see all the things you could have tweaked in tuneGrid (in the lm case, the only tuning parameter is the intercept). It's silly that you can't simply rely on formula syntax, but alas.



Related Topics



Leave a reply



Submit