Logistic regression - defining reference level in R
Assuming you have class saved as a factor, use the relevel()
function:
auth$class <- relevel(auth$class, ref = "YES")
Confused with the reference level in logistic regression in R
If P(0)
is the probability of 0 and P(1)
is the probability of 1, then P(0) = 1 - P(1)
. Thus, you can always calculate the probability of the reference level, regardless of which level you set as the reference.
For example, predict(model1, type="response")
gives you the probability of the non-reference level. 1 - predict(model1, type="response")
gives you the probability of the reference level.
You also asked, "what is glm()
to predict in default if we use response other than '0' and '1'." For (binomial) logistic regression to be appropriate, your outcome needs to be a categorical variable with two categories. You can call them whatever you want, 0/1, black/white, because/otherwise, Mal/Serenity, etc. One will be the reference level--whichever you prefer--and the model will give you the probability of the other level. The probability of the reference level is just 1 minus the probability of the other level.
If your outcome has more than two categories, you can use a multinomial logistic regression model, but the principle is similar.
Logistic regression outcome variable predictions in r
By default, R
uses alphabetical order for levels of factor. You can set your own order simply by
df$Group <- factor(df$Group, levels=c('CON','CI'))
Then CON
would be used as reference level in logistic regression and you should get the same results as with 0/1 coding.
Changing reference group for categorical predictor variable in logistic regression
Use the C
function to define your contrasts in the dataframe.
If your dataframe is DF
and the factor variable is fct
, then
DF$fct <- C(DF$fct, contr.treatment, base=3)
(untested).
Is there a way to display the reference category in a regression output in R?
The reference level is the one that is missing in the summary, because the coefficients of the other levels are the contrasts to the reference level, i.e. the intercept actually represents the mean in the reference category.
iris <- transform(iris, Species_=factor(Species)) ## create factor
summary(lm(Sepal.Length ~ Petal.Length + Species_, iris))$coe
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.6835266 0.10609608 34.718780 1.968671e-72
# Petal.Length 0.9045646 0.06478559 13.962436 1.121002e-28
# Species_versicolor -1.6009717 0.19346616 -8.275203 7.371529e-14
# Species_virginica -2.1176692 0.27346121 -7.743947 1.480296e-12
You could remove the intercept, to get the missing level displayed, but that makes not much sense. You then just get the means of each level without a reference, however you are interested in the contrast between the reference level and the other levels.
summary(lm(Sepal.Length ~ 0 + Petal.Length + Species_, iris))$coe
# Estimate Std. Error t value Pr(>|t|)
# Petal.Length 0.9045646 0.06478559 13.962436 1.121002e-28
# Species_setosa 3.6835266 0.10609608 34.718780 1.968671e-72
# Species_versicolor 2.0825548 0.28009598 7.435147 8.171219e-12
# Species_virginica 1.5658574 0.36285224 4.315413 2.921850e-05
If you're not sure, the reference level is always the first level of the factor.
levels(iris$Species_)[1]
# [1] "setosa"
To prove that, specify a different reference level and see if it's first.
iris$Species_ <- relevel(iris$Species_, ref='versicolor')
levels(iris$Species_)[1]
# [1] "versicolor"
It is common to refer to the reference level in a note under the table in the report, and I recommend that you do the same.
Related Topics
How to Increase the Size of Points in Legend of Ggplot2
Can't Load X11 in R After Os X Yosemite Upgrade
Adjusting Width of Tables Made with Kable() in Rmarkdown Documents
Accessing Excel File from Sharepoint with R
Solving for the Inverse of a Function in R
Using Legend with Stat_Function in Ggplot2
Clustering Very Large Dataset in R
Interpolate Zoo Object with Missing Dates
How to Find Index of Match Between Two Set of Data Frame
What Is a Good Way to Read Line-By-Line in R
Remove Empty Elements from List with Character(0)
Showing Different Axis Labels Using Ggplot2 with Facet_Wrap
Align Violin Plots with Dodged Box Plots
Include Data Examples in Developing R Packages
Using ':=' in Data.Table to Sum the Values of Two Columns in R, Ignoring Nas
Applying R Script Prepared for Single File to Multiple Files in the Directory