Using glmer for logistic regression, how to verify response reference
You could simulate some data where you know the true effects ... ?simulate.merMod
makes this relatively easy. In any case,
- the effects are interpreted in terms of their effect on the log-odds of a response of 1
- e.g., a slope of 0.5 implies that a 1-unit increase in the predictor variable increases the log-odds of observing a 1 rather than a 0 by 0.5.
- for questions of this sort,
glmer
inherits its framework fromglm
. In particular,?family
states:
For the ‘binomial’ and ‘quasibinomial’ families the response can
be specified in one of three ways:1. As a factor: ‘success’ is interpreted as the factor not
having the first level (and hence usually of having the
second level).
2. As a numerical vector with values between ‘0’ and ‘1’,
interpreted as the proportion of successful cases (with the
total number of cases given by the ‘weights’).
3. As a two-column integer matrix: the first column gives the
number of successes and the second the number of failures.
Your data are a (common) special case of #2 (the "proportion of successes" is either zero or 100% for each case, because there is only one case per observation; the weights vector is a vector of all ones by default).
glmer reference outcome string
You need to change your reference level's order. This post demonstrates how do so. In your case write it this way:
data$Outcome <- factor(data$Outcome, levels = c("g", "c"))
Edit based upon OP's comment- To answer your question: Yes, factor levels are alphabetical by default. This R-Blogger's post discusses it more.
Logistic regression - defining reference level in R
Assuming you have class saved as a factor, use the relevel()
function:
auth$class <- relevel(auth$class, ref = "YES")
r specific glm logistic regresion - what am I modeling?
If you have a factor value with just two levels and are using a logistic regrssion, then R will treat the first level as no event (0) and the second level as "success" (1). You can view the order of the levels with levels(dataframe$columnname)
.
If you want to change the reference level, then relevel
will do the trick
dd$gender <- relevel(dd$gender, "male")
For example, consider data
dd<-data.frame(x=runif(50))
dd<-transform(dd,outcome=ifelse(runif(50)<x,"event","noevent"))
levels(dd$outcome)
# [1] "event" "noevent"
with(dd, table(lessthanhalf=x<.5, outcome))
# outcome
# lessthanhalf event noevent
# FALSE 15 8
# TRUE 6 21
Here we can see that increasing x values are associated with more "events". We can model this with
glm(outcome~x, dd, family=binomial)
# Call: glm(formula = outcome ~ x, family = binomial, data = dd)
#
# Coefficients:
# (Intercept) x
# 2.773 -4.990
By default, we are modeling the probability of "noevent" so as x increasing the probability of noevent decreases, we can change to model the the probability of "event" by making "noevent" the reference category
glm(relevel(outcome,"noevent")~x, dd, family=binomial)
# Call: glm(formula = relevel(outcome, "noevent") ~ x, family = binomial,
# data = dd)
# Coefficients:
# (Intercept) x
# -2.773 4.990
How to predict ln(odds) with rcs term in mixed effects logistic model?
- It looks reasonable.
- Yes. Just
predict(m)
should give you the predicted log-odds for each observation (sincepredict
uses the original data by default, andtype = "link"
is also the default). (But being more explicit doesn't hurt.) - You could construct a data frame that has the baseline value for the non-focal predictors and
Anger
spanning an appropriate range of values, then use that asnewdata
in apredict
call. Or you could use theemmeans
oreffects
packages, both of which try to automate this process for you.
This worked for me (natural splines with ns
seem to integrate better with lme4
than rcs
)
library(lme4)
library(effects)
m <- glmer(r2 ~ splines::ns(Anger, df = 5) + Gender + situ + btype + (1 | id),
data = VerbAgg, family = binomial("logit"),
control=glmerControl(optimizer="bobyqa"))
plot(Effect("Anger", m)
You can do a significance test of the spline term by using car::Anova()
or by model comparison (likelihood ratio test):
m0 <- update(m, r2 ~ Gender + situ + btype + (1 | id))
anova(m, m0)
Hausman-Test with glmer
I guess pid
is a patient or person identifier, so you have repeated measures from the same persons. If so, I would include time as well in your model, e.g.
glmer(populist ~ wkb + married + age + I(age^2) + time + (1 + time | pid)
Regarding your questions:
yes,
(1 | pid)
describes the random part of your model. You have a varying intercept (1
) for eachpid
, i.e. you assume that your outcome has a different average value for eachpid
. In repeated measurement / longitudinal design, you may also assume that your outcome varies stronger/weaker for differentpid
, i.e. you have random slopes as well. That's why I would add time as well.A "Fixed Effects" Model would require some data preparation, e.g. demeaning and removing the intercept. So from a first quick glance, I would say it does not perfectly describe a FE model.
& 4. I personally would not use the Hausman-test, nor a fixed effects regression at all. There a some good publications showing that the mixed model is in general better than any fixed effects model. You can find examples that summarizes these discussion a bit here and also some information about the issue of correlated group factors and fixed effects here, and how to address these issues using mixed models.
Summary: mixed models can model both between- and within-effects (while FE can only model within-effects), FE models lack information of variation in the group-effects or between-subject effects, FE regression cannot include random slopes (thus neglecting “cross-cluster differences in the effects of lower-level controls (which) reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power” (Heisig et al. 2017).
Here are some of the references that I suggest reading:
Bafumi J, Gelman A. 2006. Fitting Multilevel Models When Predictors and Group Effects Correlate. In. Philadelphia, PA: Annual meeting of the American Political Science Association.
Bell A, Fairbrother M, Jones K. 2018. Fixed and Random Effects Models: Making an Informed Choice. Quality & Quantity.
Bell A, Jones K. 2015. Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research and Methods, 3(1), 133–153.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge ; New York: Cambridge University Press (in particular chapter 12.6)
Heisig JP, Schaeffer M, Giesecke J. 2017. The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls. American Sociological Review 82 (4): 796–827.
Related Topics
Include Non-Cran Package in Cran Package
Rselenium on Docker: Where Are Files Downloaded
The Fastest Way to Convert Numeric to Character in R
Overlapped Density Plots in Ggplot2
Blockwise Sum of Matrix Elements
How to Debug Methods from Reference Classes
How to Add Multiple Columns to a Tibble
Schedule a Rscript Crontab Everyminute
How to Simulate Bimodal Distribution
An Error in R: When I Try to Apply Outer Function:
Total of a Column in Dt Datatables in Shiny
How to Filter an R Simple Features Collection Using Sf Methods Like St_Intersects()
R Ddply with Multiple Variables
Tiff Plot Generation and Compression: R VS. Gimp VS. Irfanview VS. Photoshop File Sizes
Change Thickness of a Marker in Ggplot2
Using Anonymous Functions with Summarize_Each or Mutate_Each