Warning: non-integer #successes in a binomial glm! (survey packages)
There's nothing wrong, glm
is just picky when it comes to specifying binomial (and Poisson) models. It warns if it detects that the no. of trials or successes is non-integral, but it goes ahead and fits the model anyway. If you want to suppress the warning (and you're sure it's not a problem), use family=quasibinomial
instead.
How do i Interpret the coefficients of glm with binomial error distribution?
The "zero line" in this case is x=1 and not x=0.
Question 2:
the question is. Is there a effect that is different from zero?
But odds of 1 basicaly means zero.
Question 1:
When the estimate is exp the result can not be negative.But odds below 1 express a negative effect.
Here are some sources to calculate the confidence intervall for anyone stumbling over this post.
https://fromthebottomoftheheap.net/2018/12/10/confidence-intervals-for-glms/
https://stats.stackexchange.com/questions/304833/how-to-calculate-odds-ratio-and-95-confidence-interval-for-logistic-regression
using svyglm from a glm perspective
The mtcars dataset is small: it has only 32 observations. Let's generate a larger dataset. The simulation is simple and doesn't take the inverse weight sampling into account.
And I've added a third model, geepack::geeglm
, that can fit a GLM with inverse probability weighting.
With the larger dataset, all three models have the same estimates for the coefficients and their standard errors.
Part 1. Generate data
library("survey")
library("geepack")
library("tidyverse")
set.seed(1234)
inv_logit <- function(x) {
plogis(x)
}
n <- 100
# Gnerate dataset
mydata <-
tibble(
id = seq(n),
x1 = rnorm(n),
x2 = rnorm(n),
x3 = rnorm(n),
y = rbinom(n,
size = 1,
prob = inv_logit(0.1 * x1 + 0.2 * x2 + 0.3 * x3) + rnorm(n, sd = 0.1)
),
pweight = runif(n)
) %>%
# geeglm requires the data is ordered by time (or observation) within each cluster
arrange(id)
Part 2. Fit models
fit.glm <- glm(
y ~ x1 + x2 + x3,
weights = pweight,
family = "binomial",
data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.glm)
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.319 0.340 -0.938 0.348
#> 2 x1 0.638 0.362 1.77 0.0776
#> 3 x2 0.591 0.372 1.59 0.112
#> 4 x3 0.972 0.446 2.18 0.0291
design <- svydesign(
id = ~1,
weights = ~pweight,
data = mydata
)
fit.svyglm <- svyglm(
y ~ x1 + x2 + x3,
design = design,
family = "binomial",
data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.svyglm)
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.319 0.269 -1.18 0.239
#> 2 x1 0.638 0.293 2.18 0.0317
#> 3 x2 0.591 0.297 1.99 0.0497
#> 4 x3 0.972 0.329 2.95 0.00397
fit.geeglm <- geeglm(
y ~ x1 + x2 + x3,
id = mydata$id,
weights = pweight,
family = "binomial",
corstr = "independence",
data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.geeglm)
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.319 0.268 1.42 0.234
#> 2 x1 0.638 0.291 4.80 0.0284
#> 3 x2 0.591 0.296 3.99 0.0457
#> 4 x3 0.972 0.328 8.80 0.00301
Created on 2022-03-16 by the reprex package (v2.0.1)
plotting estimates for binomial glm using sjplot in r
You need to include the argument transform = NULL
to prevent tab_model
from automatically exponentiating the estimates into odds ratios:
tab_model(model0, transform = NULL, show.est = TRUE, show.se = TRUE)
How do I fit a GLM using Binomial Distribution for this data in R?
One suitable way would be:
vaccine <- c(rep(c(0,1,2),c(12,4,8)),rep(c(0,1,2),c(175,61,340)))
cough <- c(rep(1,12+4+8),rep(0,175+61+340))
Then you could do something like:
linfit <- glm(cough~vaccine,family=binomial)
summary(linfit)
or
factorfit <- glm(cough~as.factor(vaccine),family=binomial)
summary(factorfit)
or
ordfactorfit <- glm(cough~ordered(vaccine),family=binomial)
summary(ordfactorfit)
or perhaps some other possibilities, depending on what your particular hypotheses were.
This isn't the only way to do it (and you may not want to do it with really large data sets), but "untabulating" in this fashion makes some things easy. You can retabulate easily enough (table(data.frame(cough=cough,vaccine=vaccine))
).
You may also find the signed-root-contributions-to-chi-square interesting:
t=table(data.frame(cough=cough,vaccine=vaccine))
r=rowSums(t)
c=colSums(t)
ex=outer(r,c)/sum(t)
print((t-ex)/sqrt(ex),d=3)
vaccine
cough 0 1 2
0 -0.337 -0.177 0.324
1 1.653 0.868 -1.587
These have an interpretation somewhat analogous to standardized residuals.
A plot of the proportion of No
s against vaccine (with say $\pm$1 standard errors marked in) would be similarly useful.
Reproduce binomial glm using only the model object
You can reproduce the model in the following way:
model_r <- glm(formula(model), model$data, family = family(model))
Compare:
cbind(coef(model), coef(model_r))
# [,1] [,2]
# (Intercept) -3.693688931 -3.693688931
# gre 0.001855502 0.001855502
# gpa 0.584915067 0.584915067
# rank -0.450051862 -0.450051862
Alternatively (similar to your approach):
model_r2 <- glm(model.frame(model)[[1]] ~ 0 + model.matrix(model),
family = family(model))
cbind(coef(model), coef(model_r2))
# [,1] [,2]
# (Intercept) -3.693688931 -3.693688931
# gre 0.001855502 0.001855502
# gpa 0.584915067 0.584915067
# rank -0.450051862 -0.450051862
Why do I get multiple predictions lines for my binomial GLM?
Sort your predictor values before plotting.
using the iris dataset with some variation:
set.seed(111)
dat = iris
dat$Species = as.numeric(dat$Species=="setosa")
dat$Petal.Width = dat$Petal.Width + rnorm(nrow(dat))
Order the predictor, in this case it is petal.width:
dat = dat[order(dat$Petal.Width),]
fit = glm(Species ~ Petal.Width,data=dat,family="binomial")
plot(dat$Species ~ dat$Petal.Width)
lines(dat$Petal.Width,fit$fitted.values,col="blue")
Related Topics
Round Vector of Numerics to Integer While Preserving Their Sum
Ordering Permutation in Rcpp I.E. Base::Order()
Pie Plot Getting Its Text on Top of Each Other
Trying to Find Row Associated with Max Value in Dataframe R
Difference Between Mean(C(1,2,21)) and Mean(1,2,21)
Adding Simple Legend to Plot in R
Generate a Sequence of Characters from 'A'-'Z'
Azure Put Blob Authentication Fails in R
Writing R Function with If Enviornment
Suppressing Messages in Knitr/Rmarkdown
How to Extract Everything Until First Occurrence of Pattern
How to Get Around Error "Factor Has New Levels" in Cross-Validation Glm
R Xml - Combining Parent and Child Nodes(W Same Name) into Data Frame