Using Glmer for Logistic Regression, How to Verify Response Reference

Using glmer for logistic regression, how to verify response reference

You could simulate some data where you know the true effects ... ?simulate.merMod makes this relatively easy. In any case,

the effects are interpreted in terms of their effect on the log-odds of a response of 1
e.g., a slope of 0.5 implies that a 1-unit increase in the predictor variable increases the log-odds of observing a 1 rather than a 0 by 0.5.
for questions of this sort, glmer inherits its framework from glm. In particular, ?family states:

For the ‘binomial’ and ‘quasibinomial’ families the response can
be specified in one of three ways:

   1. As a factor: ‘success’ is interpreted as the factor not
      having the first level (and hence usually of having the
      second level).

   2. As a numerical vector with values between ‘0’ and ‘1’,
      interpreted as the proportion of successful cases (with the
      total number of cases given by the ‘weights’).

   3. As a two-column integer matrix: the first column gives the
      number of successes and the second the number of failures.

Your data are a (common) special case of #2 (the "proportion of successes" is either zero or 100% for each case, because there is only one case per observation; the weights vector is a vector of all ones by default).

glmer reference outcome string

You need to change your reference level's order. This post demonstrates how do so. In your case write it this way:

data$Outcome <- factor(data$Outcome, levels = c("g", "c"))

Edit based upon OP's comment- To answer your question: Yes, factor levels are alphabetical by default. This R-Blogger's post discusses it more.

Logistic regression - defining reference level in R

Assuming you have class saved as a factor, use the relevel() function:

auth$class <- relevel(auth$class, ref = "YES")

r specific glm logistic regresion - what am I modeling?

If you have a factor value with just two levels and are using a logistic regrssion, then R will treat the first level as no event (0) and the second level as "success" (1). You can view the order of the levels with levels(dataframe$columnname).

If you want to change the reference level, then relevel will do the trick

dd$gender <- relevel(dd$gender, "male")

For example, consider data

dd<-data.frame(x=runif(50))
dd<-transform(dd,outcome=ifelse(runif(50)<x,"event","noevent"))

levels(dd$outcome)
# [1] "event"   "noevent"

with(dd, table(lessthanhalf=x<.5, outcome))
#             outcome
# lessthanhalf event noevent
#        FALSE    15       8
#        TRUE      6      21

Here we can see that increasing x values are associated with more "events". We can model this with

glm(outcome~x, dd, family=binomial)

# Call:  glm(formula = outcome ~ x, family = binomial, data = dd)
# 
# Coefficients:
# (Intercept)            x  
#       2.773       -4.990

By default, we are modeling the probability of "noevent" so as x increasing the probability of noevent decreases, we can change to model the the probability of "event" by making "noevent" the reference category

glm(relevel(outcome,"noevent")~x, dd, family=binomial)

# Call:  glm(formula = relevel(outcome, "noevent") ~ x, family = binomial, 
#     data = dd)

# Coefficients:
# (Intercept)            x  
#      -2.773        4.990

How to predict ln(odds) with rcs term in mixed effects logistic model?

It looks reasonable.
Yes. Just predict(m) should give you the predicted log-odds for each observation (since predict uses the original data by default, and type = "link" is also the default). (But being more explicit doesn't hurt.)
You could construct a data frame that has the baseline value for the non-focal predictors and Anger spanning an appropriate range of values, then use that as newdata in a predict call. Or you could use the emmeans or effects packages, both of which try to automate this process for you.

This worked for me (natural splines with ns seem to integrate better with lme4 than rcs)

library(lme4)
library(effects)
m <- glmer(r2 ~ splines::ns(Anger, df = 5) + Gender + situ + btype + (1 | id), 
           data = VerbAgg, family = binomial("logit"),
           control=glmerControl(optimizer="bobyqa"))
plot(Effect("Anger", m)

You can do a significance test of the spline term by using car::Anova() or by model comparison (likelihood ratio test):

m0 <- update(m, r2 ~ Gender + situ + btype + (1 | id))
anova(m, m0)

Hausman-Test with glmer

I guess pid is a patient or person identifier, so you have repeated measures from the same persons. If so, I would include time as well in your model, e.g.

glmer(populist ~ wkb + married + age + I(age^2) + time + (1 + time | pid)

Regarding your questions:

yes, (1 | pid) describes the random part of your model. You have a varying intercept (1) for each pid, i.e. you assume that your outcome has a different average value for each pid. In repeated measurement / longitudinal design, you may also assume that your outcome varies stronger/weaker for different pid, i.e. you have random slopes as well. That's why I would add time as well.
A "Fixed Effects" Model would require some data preparation, e.g. demeaning and removing the intercept. So from a first quick glance, I would say it does not perfectly describe a FE model.
& 4. I personally would not use the Hausman-test, nor a fixed effects regression at all. There a some good publications showing that the mixed model is in general better than any fixed effects model. You can find examples that summarizes these discussion a bit here and also some information about the issue of correlated group factors and fixed effects here, and how to address these issues using mixed models.

Summary: mixed models can model both between- and within-effects (while FE can only model within-effects), FE models lack information of variation in the group-effects or between-subject effects, FE regression cannot include random slopes (thus neglecting “cross-cluster differences in the effects of lower-level controls (which) reduces the precision of estimated context effects, resulting in unnecessarily wide confidence intervals and low statistical power” (Heisig et al. 2017).

Here are some of the references that I suggest reading:

Bafumi J, Gelman A. 2006. Fitting Multilevel Models When Predictors and Group Effects Correlate. In. Philadelphia, PA: Annual meeting of the American Political Science Association.
Bell A, Fairbrother M, Jones K. 2018. Fixed and Random Effects Models: Making an Informed Choice. Quality & Quantity.
Bell A, Jones K. 2015. Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data. Political Science Research and Methods, 3(1), 133–153.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge ; New York: Cambridge University Press (in particular chapter 12.6)
Heisig JP, Schaeffer M, Giesecke J. 2017. The Costs of Simplicity: Why Multilevel Models May Benefit from Accounting for Cross-Cluster Differences in the Effects of Controls. American Sociological Review 82 (4): 796–827.

Using Glmer for Logistic Regression, How to Verify Response Reference