how do i exclude specific variables from a glm in R?
In addition to using the -
like in the comments
glm(Stuff ~ . - var1 - var2, data= mydata, family=binomial)
you can also subset the data frame passed in
glm(Stuff ~ ., data=mydata[ , !(names(mydata) %in% c('var1','var2'))], family=binomial)
or
glm(Stuff ~ ., data=subset(mydata, select=c( -var1, -var2 ) ), family=binomial )
(be careful with that last one, the subset function sometimes does not work well inside of other functions)
You could also use the paste
function to create a string representing the formula with the terms of interest (subsetting to the group of predictors that you want), then use as.formula
to convert it to a formula.
Exclude Specific Records from GLM?
You can use the subset
argument that many of the modelling functions in R have. For example:
glm(conversion ~ action, data = data, family = binomial(),
subset = action != "Did not use")
will fit the model to the data set after removing rows where action == "Did not use"
. If you have additional levels in action
to drop, you might use
glm(conversion ~ action, data = data, family = binomial(),
subset = !action %in% c("Did not use", "Other"))
which will exclude any rows where action
is equal to either of the supplied options.
You might also want to look at the drop.unused.levels
argument to model.frame
, which is the function that will act on any subset
argument you supply to glm()
.
PS: note how I have specified the family
; you don't need to do the weird combination of quoting. one of binomial
, binomial()
or "binomial"
should be fine as the logit link is the canonical link for the binomial family and hence it is the default in R's bionmial()
family function. If you want to specify the link, use this form: binomial(link = "logit")
.
Exclude more than one columns when build logistic regression model using glm
?formula
mentioned that you can use -
to drop term. Here is how:
glm(Vote ~. -ID, data = train, family = binomial)
g = glm(Vote ~. - ID - YOB - ABC, data = train, family = binomial)
Well I might give you some example:
> head(trees) ## this is R's built-in dataset
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
6 10.8 83 19.7
Now we build a model, dropping Girth
and Height
:
> lm(Volume ~. -Girth - Height, trees)
Call:
lm(formula = Volume ~ . - Girth - Height, data = trees)
Coefficients:
(Intercept)
30.17
Now you see that only intercept is estimated.
R - glm() formula exclude variable with conditions
If you want to remove factor levels based on the number you can do :
df$category <- factor(df$category)
glm(resolution_time ~ division + category, data = df,
subset = !category %in% levels(category)[4:5])
How to quickly and efficiently exclude a variables from a glm in R using Boruta output
There are quite a few examples from Boruta::Boruta
, one of these examples shows how one can fit a randomForest
after extracting parameters using the Boruto
algorithm. The example is shown below:
library(mlbench); data(Ozone)
library(randomForest)
na.omit(Ozone)->ozo
Boruta(V4~.,data=ozo,doTrace=2)->Bor.ozo
cat('Random forest run on all attributes:\n')
print(randomForest(V4~.,data=ozo))
cat('Random forest run only on confirmed attributes:\n')
print(randomForest(ozo[,getSelectedAttributes(Bor.ozo)],ozo$V4))
Rather than extracting the variables that should be excluded, I would continue on this example and extract variables that should be included according to the algorithm:
glm(class ~ ., data = df[, c('class', getSelectedAttributes(boruto_output))])
Note: that I could not test the above method, as there is no data included.
Related Topics
How to Remove Leading "0." in a Numeric R Variable
Why Does Dplyr's Filter Drop Na Values from a Factor Variable
Add Column to Data Frame Which Returns 1 If String Match a Certain Pattern
Plot Scatterplot on a Map in Shiny
How to Create a Presence-Absence Matrix
Saving Dynamic UI to Global R Workspace
Aggregating Unique Values in Columns to Single Dataframe "Cell"
General Guide for Creating Publication Quality Tables Using R, Sweave, and Latex
Ggplot2 PDF Import in Adobe Illustrator Missing Font Adobepistd
Embedding Googlevis Charts into a Web Site
What's the Difference Between Substitute and Quote in R
Linking Intel's Math Kernel Library (Mkl) to R on Windows
Rmarkdown in Shiny Application
Clickable Links in Shiny Datatable
Extracting Indices for Data Frame Rows That Have Max Value for Named Field