Warning: Non-Integer #Successes in a Binomial Glm! (Survey Packages)

Warning: non-integer #successes in a binomial glm! (survey packages)

There's nothing wrong, glm is just picky when it comes to specifying binomial (and Poisson) models. It warns if it detects that the no. of trials or successes is non-integral, but it goes ahead and fits the model anyway. If you want to suppress the warning (and you're sure it's not a problem), use family=quasibinomial instead.

How do i Interpret the coefficients of glm with binomial error distribution?

The "zero line" in this case is x=1 and not x=0.

Question 2:
the question is. Is there a effect that is different from zero?
But odds of 1 basicaly means zero.

Question 1:
When the estimate is exp the result can not be negative.But odds below 1 express a negative effect.

Here are some sources to calculate the confidence intervall for anyone stumbling over this post.

https://fromthebottomoftheheap.net/2018/12/10/confidence-intervals-for-glms/

https://stats.stackexchange.com/questions/304833/how-to-calculate-odds-ratio-and-95-confidence-interval-for-logistic-regression

using svyglm from a glm perspective

The mtcars dataset is small: it has only 32 observations. Let's generate a larger dataset. The simulation is simple and doesn't take the inverse weight sampling into account.

And I've added a third model, geepack::geeglm, that can fit a GLM with inverse probability weighting.

With the larger dataset, all three models have the same estimates for the coefficients and their standard errors.

Part 1. Generate data

library("survey")
library("geepack")
library("tidyverse")

set.seed(1234)

inv_logit <- function(x) {
  plogis(x)
}

n <- 100

# Gnerate dataset
mydata <-
  tibble(
    id = seq(n),
    x1 = rnorm(n),
    x2 = rnorm(n),
    x3 = rnorm(n),
    y = rbinom(n,
      size = 1,
      prob = inv_logit(0.1 * x1 + 0.2 * x2 + 0.3 * x3) + rnorm(n, sd = 0.1)
    ),
    pweight = runif(n)
  ) %>%
  # geeglm requires the data is ordered by time (or observation) within each cluster
  arrange(id)

Part 2. Fit models

fit.glm <- glm(
  y ~ x1 + x2 + x3,
  weights = pweight,
  family = "binomial",
  data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.glm)
#> # A tibble: 4 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   -0.319     0.340    -0.938  0.348 
#> 2 x1             0.638     0.362     1.77   0.0776
#> 3 x2             0.591     0.372     1.59   0.112 
#> 4 x3             0.972     0.446     2.18   0.0291

design <- svydesign(
  id = ~1,
  weights = ~pweight,
  data = mydata
)

fit.svyglm <- svyglm(
  y ~ x1 + x2 + x3,
  design = design,
  family = "binomial",
  data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.svyglm)
#> # A tibble: 4 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   -0.319     0.269     -1.18 0.239  
#> 2 x1             0.638     0.293      2.18 0.0317 
#> 3 x2             0.591     0.297      1.99 0.0497 
#> 4 x3             0.972     0.329      2.95 0.00397

fit.geeglm <- geeglm(
  y ~ x1 + x2 + x3,
  id = mydata$id,
  weights = pweight,
  family = "binomial",
  corstr = "independence",
  data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.geeglm)
#> # A tibble: 4 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   -0.319     0.268      1.42 0.234  
#> 2 x1             0.638     0.291      4.80 0.0284 
#> 3 x2             0.591     0.296      3.99 0.0457 
#> 4 x3             0.972     0.328      8.80 0.00301

^{Created on 2022-03-16 by the reprex package (v2.0.1)}

plotting estimates for binomial glm using sjplot in r

You need to include the argument transform = NULL to prevent tab_model from automatically exponentiating the estimates into odds ratios:

tab_model(model0, transform = NULL, show.est = TRUE, show.se = TRUE)

Sample Image

How do I fit a GLM using Binomial Distribution for this data in R?

One suitable way would be:

vaccine <- c(rep(c(0,1,2),c(12,4,8)),rep(c(0,1,2),c(175,61,340)))
cough <- c(rep(1,12+4+8),rep(0,175+61+340))

Then you could do something like:

linfit <- glm(cough~vaccine,family=binomial)
summary(linfit)

factorfit <- glm(cough~as.factor(vaccine),family=binomial)
summary(factorfit)

ordfactorfit <- glm(cough~ordered(vaccine),family=binomial)
summary(ordfactorfit)

or perhaps some other possibilities, depending on what your particular hypotheses were.

This isn't the only way to do it (and you may not want to do it with really large data sets), but "untabulating" in this fashion makes some things easy. You can retabulate easily enough (table(data.frame(cough=cough,vaccine=vaccine))).

You may also find the signed-root-contributions-to-chi-square interesting:

 t=table(data.frame(cough=cough,vaccine=vaccine))
 r=rowSums(t)
 c=colSums(t)
 ex=outer(r,c)/sum(t)
 print((t-ex)/sqrt(ex),d=3)
     vaccine
cough      0      1      2
    0 -0.337 -0.177  0.324
    1  1.653  0.868 -1.587

These have an interpretation somewhat analogous to standardized residuals.

A plot of the proportion of Nos against vaccine (with say $\pm$1 standard errors marked in) would be similarly useful.

Reproduce binomial glm using only the model object

You can reproduce the model in the following way:

model_r <- glm(formula(model), model$data, family = family(model))

Compare:

cbind(coef(model), coef(model_r))
#                     [,1]         [,2]
# (Intercept) -3.693688931 -3.693688931
# gre          0.001855502  0.001855502
# gpa          0.584915067  0.584915067
# rank        -0.450051862 -0.450051862

Alternatively (similar to your approach):

model_r2 <- glm(model.frame(model)[[1]] ~ 0 + model.matrix(model), 
                family = family(model))

cbind(coef(model), coef(model_r2))
#                     [,1]         [,2]
# (Intercept) -3.693688931 -3.693688931
# gre          0.001855502  0.001855502
# gpa          0.584915067  0.584915067
# rank        -0.450051862 -0.450051862

Why do I get multiple predictions lines for my binomial GLM?

Sort your predictor values before plotting.

using the iris dataset with some variation:

set.seed(111)
dat = iris
dat$Species = as.numeric(dat$Species=="setosa")
dat$Petal.Width = dat$Petal.Width + rnorm(nrow(dat))

Order the predictor, in this case it is petal.width:

dat = dat[order(dat$Petal.Width),]

fit = glm(Species ~ Petal.Width,data=dat,family="binomial")

plot(dat$Species ~ dat$Petal.Width)
lines(dat$Petal.Width,fit$fitted.values,col="blue")

Sample Image

Warning: Non-Integer #Successes in a Binomial Glm! (Survey Packages)