Any Suggestions for How to Plot Mixem Type Data Using Ggplot2

Any suggestions for how I can plot mixEM type data using ggplot2

Look at the structure of the returned object (this should be documented in the help):

> # simple mixture of normals:
> x=c(rnorm(10000,8,2),rnorm(10000,17,4))
> xMix = normalmixEM(x, lambda=NULL, mu=NULL, sigma=NULL)

Now what:

> str(xMix)
List of 9
$ x : num [1:20000] 6.18 9.92 9.07 8.84 9.93 ...
$ lambda : num [1:2] 0.502 0.498
$ mu : num [1:2] 7.99 17.05
$ sigma : num [1:2] 2.03 4.02
$ loglik : num -59877

The lambda, mu, and sigma components define the returned normal densities. You can plot these in ggplot using qplot and stat_function. But first make a function that returns scaled normal densities:

sdnorm =
function(x, mean=0, sd=1, lambda=1){lambda*dnorm(x, mean=mean, sd=sd)}

Then:

qplot(x,geom="density") + stat_function(fun=sdnorm,args=list(mean=xMix$mu[1],sd=xMix$sigma[1], lambda=xMix$lambda[1]),fill="blue",geom="polygon")  + stat_function(fun=sdnorm,args=list(mean=xMix$mu[2],sd=xMix$sigma[2], lambda=xMix$lambda[2]),fill="#FF0000",geom="polygon") 

Sample Image

Or whatever ggplot skills you have. Transparent colours on the densities might be nice.

ggplot(data.frame(x=x)) + 
geom_histogram(aes(x=x,y=..density..),fill="white",color="black") +
stat_function(fun=sdnorm,
args=list(mean=xMix$mu[2],
sd=xMix$sigma[2],
lambda=xMix$lambda[2]),
fill="#FF000080",geom="polygon") +
stat_function(fun=sdnorm,
args=list(mean=xMix$mu[1],
sd=xMix$sigma[1],
lambda=xMix$lambda[1]),
fill="#00FF0080",geom="polygon")

producing:

Sample Image

How to incorporate data into plot which was constructed in ggplot2 using data from another file (R)?

The question you ask about ggplot combining source of data to plot different element is answered in this post here

Now, I don't know for sure how this is going to apply to your specific data. Here I want to show you an example that might help you to go forward.

Imagine we have two data.frames (see bellow) and we want to obtain a plot similar to the one you presented.

data1 <- data.frame(list(
x=seq(-4, 4, 0.1),
y=dnorm(x = seq(-4, 4, 0.1))))
data2 <- data.frame(list(
"name"=c("name1", "name2"),
"Score" = c(-1, 1)))

The first step is to find the "y" coordinates of the names in the second data.frame (data2). To do this I added a y column to data2. y is defined here as a range of points from the may value of y to the min value of y with some space for aesthetics.

range_y = max(data1$y) - min(data1$y)
space_y = range_y * 0.05
data2$y <- seq(from = max(data1$y)-space, to = min(data1$y)+space, length.out = nrow(data2))

Then we can use ggplot() to plot data1 and data2 following some plot designs. For the current example I did this:

library(ggplot2)
p <- ggplot(data=data1, aes(x=x, y=y)) +
geom_point() + # for the data1 just plot the points
geom_pointrange(data=data2, aes(x=Score, y=y, xmin=Score-0.5, xmax=Score+0.5)) +
geom_text(data = data2, aes(x = Score, y = y+(range_y*0.05), label=name))
p

which gave this following plot:

Example ggplot with two data

R ggplot: overlay two conditional density plots (same binary outcome variable) - possible?

One way would be to plot the two versions as layers. The overlapping areas will be slightly different, depending on the layer order, based on how alpha works in ggplot2. This may or may not be what you want. You might fiddle with the two alphas, or vary the border colors, to distinguish them more.

ggplot(df, aes(fill = c)) + 
geom_density(aes(a), position='fill', alpha = 0.5) +
geom_density(aes(b), position='fill', alpha = 0.5)

Sample Image

For example, you might make it so the fill only applies to one layer, but the other layer distinguishes groups using the group aesthetic, and perhaps a different linetype. This one seems more readable to me, especially if there is a natural ordering to the two variables that justifies putting one in the "foreground" and one in the "background."

ggplot(df) + 
geom_density(aes(a, group = c), position='fill', alpha = 0.2, linetype = "dashed") +
geom_density(aes(b, fill = c), position='fill', alpha = 0.5)

Sample Image

Integrating ggplot2 with user-defined stat_function()

Finally I have figured out how to do what I wanted and reworked my solution. I've adapted parts of answers by @Spacedman and @jlhoward for this question (which I haven't seen at the time of posting my question): Any suggestions for how I can plot mixEM type data using ggplot2. However, my solution is a little different. On one hand, I've used @Spacedman's approach of using stat_function() - the same idea I've tried to use in my original version - I like it better than the alternative, which seems a bit too complex (while more flexible). On the other hand, similarly to @jlhoward's approach, I've simplified parameter passing. I've also introduced some visual improvements, such as automatic selection of differentiated colors for the easier component distributions identification. For my EDA, I've refactored this code as an R module. However, there is still one issue, which I'm still trying to figure out: why the component distribution plots are located below the expected density plots, as shown below. Any advice on this issue will be much appreciated!

UPDATE: Finally, I've figured out the issue with scaling and updated the code and the figure accordingly - the y values need to be multiplied by the value of binwidth (in this case, it's 0.5) to account for the number of observations per bin.

Sample Image

Here's the complete reworked reproducible solution:

library(ggplot2)
library(RColorBrewer)
library(mixtools)

NUM_COMPONENTS <- 2

set.seed(12345) # for reproducibility

data <- faithful$waiting # use R built-in data

# extract 'k' components from mixed distribution 'data'
mix.info <- normalmixEM(data, k = NUM_COMPONENTS,
maxit = 100, epsilon = 0.01)
summary(mix.info)

numComponents <- length(mix.info$sigma)
message("Extracted number of component distributions: ",
numComponents)

calc.components <- function(x, mix, comp.number) {
mix$lambda[comp.number] *
dnorm(x, mean = mix$mu[comp.number], sd = mix$sigma[comp.number])
}

g <- ggplot(data.frame(x = data)) +
geom_histogram(aes(x = data, y = 0.5 * ..density..),
fill = "white", color = "black", binwidth = 0.5)

# we could select needed number of colors randomly:
#DISTRIB_COLORS <- sample(colors(), numComponents)

# or, better, use a palette with more color differentiation:
DISTRIB_COLORS <- brewer.pal(numComponents, "Set1")

distComps <- lapply(seq(numComponents), function(i)
stat_function(fun = calc.components,
arg = list(mix = mix.info, comp.number = i),
geom = "line", # use alpha=.5 for "polygon"
size = 2,
color = DISTRIB_COLORS[i]))
print(g + distComps)

ggplot2: Multiple density plots not matching parent histogram distribution

First, I think you are plotting perfect normal distributions based on the means and standard deviations of each subset, rather than the actual density-distributions. Second, you are plotting a density histogram based on 500 observations, but then two density curves based on 328 and 172 observations. Trying instead to add two separate geom_histogram and geom_density layers specifying each subset...

ggplot(dat, aes(x = Cutoff)) +
geom_histogram(data=dat[which(dat$Filter%in%"Signal"),], aes(y = ..density..), fill = "grey60", bins = 100) +
geom_histogram(data=dat[which(dat$Filter%in%"Background"),], aes(y = ..density..), fill = "grey60", bins = 100) +
geom_density(data=dat[which(dat$Filter%in%"Signal"),], aes(x=Cutoff, y=..density..), fill="chartreuse3", alpha=0.5) +
geom_density(data=dat[which(dat$Filter%in%"Background"),], aes(x=Cutoff, y=..density..), fill="firebrick2", alpha=0.5) +
theme_bw()

... will give you this plot:

Sample Image

Otherwise, if you do want to plot a single parent histogram but multiple subset densities, you should expect the peaks to not match. I.e.:

ggplot(dat, aes(x = Cutoff)) +
geom_histogram(aes(y = ..density..), fill = "grey60", bins = 100) +
geom_density(data=dat[which(dat$Filter%in%"Signal"),], aes(x=Cutoff, y=..density..), fill="chartreuse3", alpha=0.5) +
geom_density(data=dat[which(dat$Filter%in%"Background"),], aes(x=Cutoff, y=..density..), fill="firebrick2", alpha=0.5) +
theme_bw()

Sample Image

Create ggplot2 figure of two mirrored density curves with a line plot in the middle (or: recreate Drift Diffussion Model diagram in R)

Here's a solution using patchwork.

Dummy data:

library(tidyverse)
library(patchwork)

df <- tibble(
x = c(sort(runif(50)), sort(runif(50))),
y = c(cumsum(runif(50, -0.5)), cumsum(runif(50, -1.5))),
group = rep(c("A", "B"), each = 50)
)

Create the 3 plots.

p_density_top <-
df %>%
filter(group == "A") %>%
ggplot(aes(x)) +
geom_density(fill = "purple") +
theme_minimal() +
theme(
axis.title = element_blank(),
axis.text = element_blank()
)

p_density_bottom <-
df %>%
filter(group == "B") %>%
ggplot(aes(x)) +
geom_density(fill = "red") +
scale_y_reverse() +
theme_minimal() +
theme(
axis.title.y = element_blank(),
axis.text.y = element_blank()
)

p_middle <-
ggplot(df, aes(x, y, col = group)) +
geom_line() +
scale_color_manual(values = c("purple", "red")) +
theme_minimal() +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank()
)

Use patchwork to display them together.

(p_density_top + p_middle + p_density_bottom) + plot_layout(ncol = 1, heights = c(1, 5, 1))

plot



Related Topics



Leave a reply



Submit