Ggplot2: Overlay Density Plots R

ggplot2: Overlay density plots R

generally for ggplot and multiple variables you need to convert to long format from wide. I think it can be done without but that is the way the package is meant to work

Here is the solution, I generated some data (3 normal distributions centered around different points). I also did some histograms and boxplots in case you want those. The alpha parameters controls the degree of transparency of the fill, if you use color instead of fill you get only outlines

x <- data.frame(v1=rnorm(100),v2=rnorm(100,1,1),v3=rnorm(100,0,2))
library(ggplot2);library(reshape2)
data<- melt(x)
ggplot(data,aes(x=value, fill=variable)) + geom_density(alpha=0.25)
ggplot(data,aes(x=value, fill=variable)) + geom_histogram(alpha=0.25)
ggplot(data,aes(x=variable, y=value, fill=variable)) + geom_boxplot()

Sample Image

How to overlay density ggplots from different datasets in R?

As @Phil pointed out you can't overlay different plots. However, you can make one plot containing all three density plots. (; Using mtcars and mpg as example datasets try this:

library(ggplot2)

ggplot() +
geom_density(aes(mpg, fill = "data1"), alpha = .2, data = mtcars) +
geom_density(aes(hwy, fill = "data2"), alpha = .2, data = mpg) +
scale_fill_manual(name = "dataset", values = c(data1 = "red", data2 = "green"))

Sample Image

R: overlay density plot with lines based on condition of a column

Another option is ggdensity

library(ggpubr)
out <- ggdensity(df, x = c("c2", "c3", "c4"), color = "condition",
fill = "condition")
ggarrange(plotlist = out, ncol = 2, nrow = 2)

-output

Sample Image

data

df <- structure(list(condition = c("b", "c", "a", "a", "c", "b", "a", 
"c", "c", "a", "c", "b"), c2 = c(1L, 3L, 5L, 2L, 1L, 2L, 1L,
3L, 6L, 2L, 1L, 4L), c3 = c(0L, 1L, 0L, 4L, 1L, 3L, 0L, 1L, 0L,
0L, 3L, 3L), c4 = c(2L, 2L, 1L, 3L, 1L, 3L, 2L, 2L, 2L, 1L, 1L,
0L)), class = "data.frame", row.names = c(NA, -12L))

R: Overlay density plots by condition and by average plot

Is this what you're going for? I use after_stat here to scale down the conditional density plots to be comfortably lower than the total density. (which, being by definition less spiky, will tend to have lower peak densities than the conditional densities.)

ggplot(mtcars) +
geom_density(aes(mpg)) +
geom_density(aes(mpg, after_stat(count) * 0.01,
group = cyl, fill = as.character(cyl)), alpha = 0.2)

Sample Image

If you want to convert this to a function, you could use something like the following. The {{ }} or "embrace" operator works to forward the variable names into the environment of the function. More at https://rlang.r-lib.org/reference/topic-data-mask-programming.html#embrace-with-

plot_densities <- function(df, var, group) {
ggplot(df) +
geom_density(aes( {{ var }} )) +
geom_density(aes( {{ var }}, after_stat(count) * 0.01,
group = {{ group }},
fill = as.character( {{ group }} )), alpha = 0.2)
}

plot_densities(mtcars, mpg, cyl)

General rule of overlaying density plot using ggplot2

You need to make sure that to multiply value of ..count.. in in the density plot call by the value of whatever the binwidth is in the histogram call.

You can do it as follows:

set.seed(100)
a = data.frame(z = rnorm(10000))
binwidthVal=0.1
ggplot(a, aes(x=z)) +
geom_histogram(binwidth = binwidthVal) +
geom_density(colour='red', aes(y=binwidthVal * ..count..))

Sample Image

Credit to Brian Diggs for the idea.

EDIT: Seems like there is already a perfectly good answer here

Is there a way in R to overlay 3 density plots, with time as the x axis, and count as the y axis?

the step you are missing is that you need to change your dataframe into long format

let's assume your data frame looks as follows

library(tidyverse)
library(scales)

df <- data.frame(fb= lubridate::ymd(c("2020-01-01","2020-01-02","2020-01-03", "2020-01-03")),
twi = lubridate::ymd(c("2020-01-05","2020-01-05","2020-01-6", "2020-01-09")),
insta = lubridate::ymd(c("2020-01-01","2020-01-02","2020-01-05", "2020-01-05"))
)

now change the data frame into long format:

df_long <- df %>% pivot_longer(everything())

and this can be plotted

df %>% ggplot( aes(x =value, color=name, fill= name)) +
geom_density( alpha=0.8)+
theme_bw()+
scale_x_date(labels = date_format("%Y-%m"),
breaks = date_breaks("3 months")) +
labs(title = "Posts over time")+
xlab("month")+
ylab("density")

Sample Image

R ggplot: overlay two conditional density plots (same binary outcome variable) - possible?

One way would be to plot the two versions as layers. The overlapping areas will be slightly different, depending on the layer order, based on how alpha works in ggplot2. This may or may not be what you want. You might fiddle with the two alphas, or vary the border colors, to distinguish them more.

ggplot(df, aes(fill = c)) + 
geom_density(aes(a), position='fill', alpha = 0.5) +
geom_density(aes(b), position='fill', alpha = 0.5)

Sample Image

For example, you might make it so the fill only applies to one layer, but the other layer distinguishes groups using the group aesthetic, and perhaps a different linetype. This one seems more readable to me, especially if there is a natural ordering to the two variables that justifies putting one in the "foreground" and one in the "background."

ggplot(df) + 
geom_density(aes(a, group = c), position='fill', alpha = 0.2, linetype = "dashed") +
geom_density(aes(b, fill = c), position='fill', alpha = 0.5)

Sample Image

Overlapped density plots in ggplot2

Everything will work fine if you move the assignment of the colour parameter out of aes().

vec1 <- data.frame(x=rnorm(2000, 0, 1))
vec2 <- data.frame(x=rnorm(3000, 1, 1.5))

library(ggplot2)

ggplot() + geom_density(aes(x=x), colour="red", data=vec1) +
geom_density(aes(x=x), colour="blue", data=vec2)

Sample Image

How can I plot and overlay density plot of several rasters?

I'm not quite sure what you are after, but I'm going to take a shot anyway.

Suppose we have the following list of rasters of different resolutions and we're interested in plotting the distributions of the values inside the raster with the ggplot2 package.

library(raster)
#> Loading required package: sp
library(ggplot2)

rasterlist <- list(
raster1 = raster(matrix(runif(100), 10)),
raster2 = raster(matrix(rnorm(25), 5)),
raster3 = raster(matrix(rpois(64, 2), 8))
)

What we would have to do is get this raster data into a format that ggplot2 understand, which is the long format (as opposed to wide data). This means that every observation, in the raster case every cell, should be on their own row in a data.frame. We do this by transforming each raster with as.vector() and indicate the raster of origin.

df <- lapply(names(rasterlist), function(i) {
data.frame(
rastername = i,
value = as.vector(rasterlist[[i]])
)
})
df <- do.call(rbind, df)

Now that the data is in the correct format, you can feed it to ggplot. For density plots, the x positions should be the values. Setting the fill = rastername will automatically determine grouping.

ggplot(df, aes(x = value, fill = rastername)) +
geom_density(alpha = 0.3)

Sample Image

For box- or violin-plots, the group is typically on the x-axis and the values are mapped to the y-axis.

ggplot(df, aes(x = rastername, y = value)) +
geom_boxplot()

Sample Image

ggplot(df, aes(x = rastername, y = value)) +
geom_violin()

Sample Image

Created on 2020-10-04 by the reprex package (v0.3.0)

Hope that this is roughly what you were looking for.

How to overlay density plots in R?

use lines for the second one:

plot(density(MyData$Column1))
lines(density(MyData$Column2))

make sure the limits of the first plot are suitable, though.



Related Topics



Leave a reply



Submit