How to Plot Data with Confidence Intervals

Line graph with 95% confidence interval band

Is this what you're looking for using the output from your t.test?

You could look at the Broom package which has tidiers for various test outputs.

library(tidyverse)

tribble(
~treatment, ~measure, ~last_3_month, ~last_2_month, ~last_1_month,
0, "mean", 40, 26.67, 30,
0, "lower", -91.45, -11.28, 5.16,
0, "upper", 171.45, 64.61, 54.84,
1, "mean", 333.33, 500, 500,
1, "lower", 189.91, 251.59, 69.73,
1, "upper", 476.76, 748.41, 930.27
) |>
pivot_longer(-c(treatment, measure)) |>
pivot_wider(names_from = measure, values_from = value) |>
mutate(
name = factor(name),
treatment = str_c("Treatment ", treatment)
) |>
ggplot(aes(name, mean, colour = treatment, group = treatment)) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "grey90") +
geom_line()

Sample Image

Created on 2022-04-27 by the reprex package (v2.0.1)

How to plot confidence interval of a time series data in Python?

I'm not qualified to answer question 1, however the answers to this SO question produce different results from your code.

As for question 2, you can use matplotlib fill_between to fill the area between two curves (the upper and lower of your example).

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

# https://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data
def mean_confidence_interval(data, confidence=0.95):
a = 1.0 * np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1)
return m, m-h, m+h

mean, lower, upper = [],[],[]
ci = 0.8
for i in range (20):
a = np.random.rand(100) # this is the output
m, ml, mu = mean_confidence_interval(a, ci)
mean.append(m)
lower.append(ml)
upper.append(mu)

plt.figure()
plt.plot(mean,'-b', label='mean')
plt.plot(upper,'-r', label='upper')
plt.plot(lower,'-g', label='lower')
# fill the area with black color, opacity 0.15
plt.fill_between(list(range(len(mean))), upper, lower, color="k", alpha=0.15)

plt.xlabel("Value")
plt.ylabel("Loss")
plt.legend()

Sample Image

How to plot two `ggscatter` correlation plots with confidence intervals on the same graph in R?

After checking out the docs and trying several options using the color and ggp arguments of ggscatter IMHO the easiest and less time-consuming option to achieve your desired result would be to build your plot from scratch using ggplot2 with some support from ggpubr to add the regression equations and the theme:

set.seed(1)

spentWithTool <- sample(1:7, 20, replace = TRUE)
understoodWithTool <- sample(1:5, 20, replace = TRUE)
spentWithoutTool <- sample(1:4, 10, replace = TRUE)
understoodWithoutTool <- sample(1:5, 10, replace = TRUE)

library(ggplot2)
library(ggpubr)

df <- rbind.data.frame(
data.frame(x = spentWithTool, y = understoodWithTool, id = "with"),
data.frame(x = spentWithoutTool, y = understoodWithoutTool, id = "without")

)

ggplot(df, aes(x, y, color = id, fill = id)) +
geom_point() +
geom_smooth(method = "lm") +
stat_cor(method = "spearman") +
scale_color_manual(values = c(with = "red", without = "blue"), aesthetics = c("color", "fill")) +
theme_pubr() +
labs(x = "timeSpent", y = "understood")
#> `geom_smooth()` using formula = 'y ~ x'

Sample Image

Violin plot with confidence interval in r

What you can do is first calculate the error bars per condition and after that add them by using geom_errorbar like this:

library(tidyverse)
stats <- df %>%
group_by(Condition) %>%
summarise(Mean = mean(Need), SD = sd(Need),
CI_L = Mean - (SD * 1.96)/sqrt(6),
CI_U = Mean + (SD * 1.96)/sqrt(6))

ggplot() +
geom_violin(df, mapping = aes(x = Condition, y = Need, fill=Condition)) +
stat_summary(fun.data = "mean_cl_boot", geom = "pointrange",
colour = "red") +
geom_point(stats, mapping = aes(Condition, Mean)) +
geom_errorbar(stats, mapping = aes(x = Condition, ymin = CI_L, ymax = CI_U), width = 0.2) +
ggtitle("Needs by condition violin plot")

Output:

R Sample Image 21

How to plot a 95% confidence interval graph for one sample proportion

I guess you could show the 95% confidence interval for the estimated probability like this:

First, start with a data frame of 1s and 0s representing your "success" and "failure" rate in the sample. Here, your numbers suggest approximately 105 out of 1500 successes, so we do:

df <- data.frame(x = c(rep(1, 105), rep(0, 1395)))

Now we fit a logistic regression with the intercept being the only parameter we are estimating:

mod <- coef(summary(glm(x ~ 1, family = binomial, data = df)))

mod
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -2.586689 0.1011959 -25.5612 4.122466e-144

The estimate here should be normally distributed (on the log odds scale) with the given estimate and standard error, so we can grab the density values over an appropriate range by doing:

xvals <- seq(mod[1] - 3 * mod[2], mod[1] + 3 * mod[2], 0.01)
yvals <- dnorm(xvals, mod[1], mod[2])

Now we convert the x values from log odds to probabilities:

pxvals <- exp(xvals)/(1 + exp(xvals))

We will also create a vector that labels whether the values are within 1.96 standard deviations of the estimate:

level <- ifelse(xvals < mod[1] - 1.96 * mod[2], "lower",
ifelse(xvals > mod[1] + 1.96 * mod[2], "upper", "estimate"))

Now we put all of these in a data frame and plot:

plot_df <- data.frame(xvals, yvals, pxvals, level)

library(ggplot2)

ggplot(plot_df, aes(pxvals, yvals, fill = level)) +
geom_area(alpha = 0.5) +
geom_vline(xintercept = exp(mod[1])/(1 + exp(mod[1])), linetype = 2) +
scale_fill_manual(values = c("gray70", "deepskyblue4", "deepskyblue4"),
guide = guide_none()) +
scale_x_continuous(limits = c(0.03, 0.13), breaks = 3:12/100,
name = "probability") +
theme_bw()

Sample Image

Note that because we have transformed the x axis, this is no longer a genuine density plot. The y axis becomes somewhat arbitrary as a result, but the plot still shows accurately the 95% confidence interval for the probability estimate.


EDIT

Here's an alternative method if the glm approach seems too complicated. It uses the binomial distribution to get the 95% confidence intervals. You just supply it with the population size and the number of "successes"

library(ggplot2)

population <- 1500
actual_successes <- 105
test_successes <- 1:300

density <- dbinom(test_successes, population, actual_successes/population)
probs <- pbinom(test_successes, population, actual_successes/population)
label <- ifelse(probs < 0.025, "low", ifelse(probs > 0.975, "high", "CI"))

ggplot(data.frame(probability = test_successes/population, density, label),
aes(probability, density, fill = label)) +
geom_area(alpha = 0.5) +
geom_vline(xintercept = actual_successes/population, linetype = 2) +
scale_fill_manual(values = c("gray70", "deepskyblue4", "deepskyblue4"),
guide = guide_none()) +
scale_x_continuous(limits = c(0.03, 0.13), breaks = 3:12/100,
name = "probability") +
theme_bw()

R Sample Image 22



Related Topics



Leave a reply



Submit