How to Exclude Specific Variables from a Glm in R

how do i exclude specific variables from a glm in R?

In addition to using the - like in the comments

glm(Stuff ~ . - var1 - var2, data= mydata, family=binomial)

you can also subset the data frame passed in

glm(Stuff ~ ., data=mydata[ , !(names(mydata) %in% c('var1','var2'))], family=binomial)

or

glm(Stuff ~ ., data=subset(mydata, select=c( -var1, -var2 ) ), family=binomial )

(be careful with that last one, the subset function sometimes does not work well inside of other functions)

You could also use the paste function to create a string representing the formula with the terms of interest (subsetting to the group of predictors that you want), then use as.formula to convert it to a formula.

Exclude Specific Records from GLM?

You can use the subset argument that many of the modelling functions in R have. For example:

glm(conversion ~ action, data = data, family = binomial(),
subset = action != "Did not use")

will fit the model to the data set after removing rows where action == "Did not use". If you have additional levels in action to drop, you might use

glm(conversion ~ action, data = data, family = binomial(),
subset = !action %in% c("Did not use", "Other"))

which will exclude any rows where action is equal to either of the supplied options.

You might also want to look at the drop.unused.levels argument to model.frame, which is the function that will act on any subset argument you supply to glm().

PS: note how I have specified the family; you don't need to do the weird combination of quoting. one of binomial, binomial() or "binomial" should be fine as the logit link is the canonical link for the binomial family and hence it is the default in R's bionmial() family function. If you want to specify the link, use this form: binomial(link = "logit").

Exclude more than one columns when build logistic regression model using glm

?formula mentioned that you can use - to drop term. Here is how:

glm(Vote ~. -ID, data = train, family = binomial)

g = glm(Vote ~. - ID - YOB - ABC, data = train, family = binomial)

Well I might give you some example:

> head(trees) ## this is R's built-in dataset

Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
6 10.8 83 19.7

Now we build a model, dropping Girth and Height:

> lm(Volume ~. -Girth - Height, trees)

Call:
lm(formula = Volume ~ . - Girth - Height, data = trees)

Coefficients:
(Intercept)
30.17

Now you see that only intercept is estimated.

R - glm() formula exclude variable with conditions

If you want to remove factor levels based on the number you can do :

df$category <- factor(df$category)

glm(resolution_time ~ division + category, data = df,
subset = !category %in% levels(category)[4:5])

How to quickly and efficiently exclude a variables from a glm in R using Boruta output

There are quite a few examples from Boruta::Boruta, one of these examples shows how one can fit a randomForest after extracting parameters using the Boruto algorithm. The example is shown below:

library(mlbench); data(Ozone)
library(randomForest)
na.omit(Ozone)->ozo
Boruta(V4~.,data=ozo,doTrace=2)->Bor.ozo
cat('Random forest run on all attributes:\n')
print(randomForest(V4~.,data=ozo))
cat('Random forest run only on confirmed attributes:\n')
print(randomForest(ozo[,getSelectedAttributes(Bor.ozo)],ozo$V4))

Rather than extracting the variables that should be excluded, I would continue on this example and extract variables that should be included according to the algorithm:

glm(class ~ ., data = df[, c('class', getSelectedAttributes(boruto_output))])

Note: that I could not test the above method, as there is no data included.



Related Topics



Leave a reply



Submit