R Histogram with Multiple Populations

How to plot two histograms together in R?

That image you linked to was for density curves, not histograms.

If you've been reading on ggplot then maybe the only thing you're missing is combining your two data frames into one long one.

So, let's start with something like what you have, two separate sets of data and combine them.

carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))

# Now, combine your two dataframes into one.  
# First make a new column in each that will be 
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'

# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)

After that, which is unnecessary if your data is in long format already, you only need one line to make your plot.

ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)

Sample Image

Now, if you really did want histograms the following will work. Note that you must change position from the default "stack" argument. You might miss that if you don't really have an idea of what your data should look like. A higher alpha looks better there. Also note that I made it density histograms. It's easy to remove the y = ..density.. to get it back to counts.

ggplot(vegLengths, aes(length, fill = veg)) + 
   geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')

Sample Image

Histogram with multiple bins and groups

Yes, you will have to restructure your data. You can do it in R as shown by @stefan or if it's challenging you can do it in excel itself. Tidy data is easy to plot and analyze (see section 12.1 for tidy data and section 3.7, 3.8 for visualization). Tidy data will look something like consisting of four columns - Distance, Value, Value_year, Value_group.

Sample Image

As an example, I stored some data as a tab-delimited file (testdata.txt) and read in using tidyverse's read_delim function. Following is the example code:

library(tidyverse)
foo <- read_delim("testdata.txt", delim = "\t")
foo %>% mutate(Val_year = factor(Val_year, levels=c("2015","2016","2017"))) %>% 
ggplot() + geom_bar(aes(x=Dist, y=Val, fill = Val_year), stat = "identity", position = "dodge") + facet_grid(.~Val_grp)

Sample Image

How to plot multiple mean lines in a single histogram with multiple groups present?

i've made some reproducible code that might help you with your problem.

library(tidyverse)

# Generate some random data
df <-  data.frame(value = c(runif(50, 0.5, 1), runif(50, 1, 1.5)), 
                  type = c(rep("type1", 50), rep("type2", 50)))

# Calculate means from df
stats <- df %>% group_by(type) %>% summarise(mean = mean(value),
                                             n = n())

# Make the ggplot
ggplot(df, aes(x= value, fill= type,  color = type)) +
  geom_histogram(position="identity", alpha=0.2) +
  labs(x = "Value", y = "Count", fill = "Type", title = "Title") +
  guides(color = FALSE) +
  geom_vline(data = stats, aes(xintercept = mean, color = type), size = 2) +
  geom_text(data = stats, aes(x = mean, y = max(df$value), label = n), 
            size = 10, 
            color = "black")

If things go as intended, you'll end up something akin to the following plot.

histogram with means

Multi-group histogram with group-specific frequencies

Give this a try. In this, I am using dplyr which is a package that contains updated versions of the ddply-type functions from plyr. One thing, I am not sure if you want to have your x-axis be the Study_Groups or your Genotypes. your question states you want the frequency of Genotype within each group but your graph has the Genotypes on the x. The solution follows the stated desire, not the plot. However, making the change to get Genotype on the x is simple. I'll note in the code comments where and what change to make.

library(dplyr)
library(ggplot2)

df2 <- df %>%
  count(Study_Group, Genotypes) %>%
  group_by(Study_Group) %>% #change to `group_by(Genotypes) %>%` for alternative approach
  mutate(prop = n / sum(n))

ggplot(data = df2, aes(Study_Group, prop, fill = Genotypes)) + 
  geom_bar(stat = "identity", position = "dodge")

Sample Image

Multiple Relative frequency histogram in R, ggplot

Below are some basic example with the build-in iris dataset. The relative part is obtained by multiplying the density with the binwidth.

library(ggplot2)

ggplot(iris, aes(Sepal.Length, fill = Species)) +
  geom_histogram(aes(y = after_stat(density * width)),
                 position = "identity", alpha = 0.5)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sample Image

ggplot(iris, aes(Sepal.Length)) +
  geom_histogram(aes(y = after_stat(density * width))) +
  facet_wrap(~ Species)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sample Image

^{Created on 2022-03-07 by the reprex package (v2.0.1)}

R ggplot Histogram group shows sum of two groups

In your dataframe, you have the column "Group" which represents both values Training and Test.
ggplot understands that you are representing one histogram with two groups.
Your second plot represents two distinct histograms on the same grid, and transparency (alpha) makes it what it actually what it look like.

Moreover, maybe you will prefer this one :

plot3 <- ggplot(data=Donald_1) + 
  geom_histogram(aes_string(x = "Alter", y = "..count..", fill = "Group"),
                 bins=20, alpha=0.7, position="dodge")

How to plot multiple stacked histograms together in R?

you can do it like this with reshape and ggplot2

require(reshape2)  # this is the library that lets you flatten out data
require(ggplot2)   # plotting library

bucket<-list(a=a,cs=cs,e=e,qr=qr,w=w) # this puts all values in one list

# the melt command flattens the 'bucket' list into value/vectorname pairs
# the 2 columns are called 'value' and 'L1' by default
# 'fill' will color bars differently depending on L1 group
ggplot(melt(bucket), aes(value, fill = L1)) + 
#call geom_histogram with position="dodge" to offset the bars and manual binwidth of 2
geom_histogram(position = "dodge", binwidth=2)

EDIT - sorry you asked for stacked, just change the last line to:

geom_histogram(position = "stack", binwidth=2)

plot

stacked plot

Creating an overlap histogram using two different vectors with ggplot

I couldn't get most of the provided code to run, but if the issue is that the two variables you want to populate histograms with have different numbers of values then something like the following should work:

library(tidyverse)

score_a <- rnorm(n = 50, mean = 0, sd = 1)
score_b <- rnorm(n = 75, mean = 2, sd = 0.75)

# Basic plot:
ggplot() +
  # Add one histogram:
  geom_histogram(aes(score_a), color = "black", fill = "red", alpha = 0.7) +
  # Add second, which has a different number of values
  geom_histogram(aes(score_b), color = "black", fill = "blue", alpha = 0.7) +
  # Black and white theme
  theme_bw()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sample Image

Edit: If you want to have more control over the x-axis and set it based on min/max of your values, it could look something like the below example. Note that here I've used the round() function because of the values I'm using for the example, but you could omit this and labels = or breaks = seq(from = min_x, to = max_x, by = 0.5) instead if rounding isn't necessary.

# Labeling the x-axis based on the min/max might look like this:

# Define axis breaks & labels:
min_x <- min(c(score_a, score_b))
max_x <- max(c(score_a, score_b))

ggplot() +
  # Add one histogram:
  geom_histogram(aes(score_a), color = "black", fill = "red", alpha = 0.7) +
  # Add second, which has a different number of values
  geom_histogram(aes(score_b), color = "black", fill = "blue", alpha = 0.7) +
  # Black and white theme
  theme_bw() +
  scale_x_continuous(
    breaks = round(x = seq(from = min_x, to = max_x, by = 0.5),
                   digits = 1),
    labels = round(x = seq(from = min_x, to = max_x, by = 0.5),
                   digits = 1))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sample Image

^{Created on 2021-09-24 by the reprex package (v2.0.0)}

R Histogram with Multiple Populations

How to plot two histograms together in R?

Histogram with multiple bins and groups

How to plot multiple mean lines in a single histogram with multiple groups present?

Multi-group histogram with group-specific frequencies

Multiple Relative frequency histogram in R, ggplot

R ggplot Histogram group shows sum of two groups

How to plot multiple stacked histograms together in R?

Creating an overlap histogram using two different vectors with ggplot

Related Topics

Leave a reply