Stacked histograms like in flow cytometry
require(ggplot2)
require(plyr)
my.data <- as.data.frame(rbind( cbind( rnorm(1e3), 1) , cbind( rnorm(1e3)+2, 2), cbind( rnorm(1e3)+3, 3), cbind( rnorm(1e3)+4, 4)))
my.data$V2=as.factor(my.data$V2)
calculate the density depending on V2
res <- dlply(my.data, .(V2), function(x) density(x$V1))
dd <- ldply(res, function(z){
data.frame(Values = z[["x"]],
V1_density = z[["y"]],
V1_count = z[["y"]]*z[["n"]])
})
add an offset depending on V2
dd$offest=-as.numeric(dd$V2)*0.2 # adapt the 0.2 value as you need
dd$V1_density_offest=dd$V1_density+dd$offest
and plot
ggplot(dd, aes(Values, V1_density_offest, color=V2)) +
geom_line()+
geom_ribbon(aes(Values, ymin=offest,ymax=V1_density_offest, fill=V2),alpha=0.3)+
scale_y_continuous(breaks=NULL)
Stacked Histograms in R
I think you might have the best luck with the 'ggplot2' package, and the chart you're looking for is a "stacked bar chart" and not a histogram.
Setup: Create some sample data.
data <- data.frame(age=sample(c("15-19", "20-24", "25-29","30-34"),100,rep=TRUE), ratio=rnorm(100,mean=1,sd=0.3))
Plot it: We can just use the 'qplot' function here.
library(ggplot2)
qplot(ratio, data=data, geom="bar", fill=age, binwidth=0.1)
Here, we tell the 'qplot' function to use the [ratio] data from our [data] data frame and to plot it in a bar chart geometry. The data should be split and colored by the [age] (fill=age
), and each bar should be 0.1 wide. You should be able to adjust this to your needs.
Vertically stack density plots with ggplot2
I would use facet_grid instead of facet_wrap to achieve this, but that is the easiest method in ggplot2
Here's a working example:
diamonds %>%
filter(cut %in% c('Ideal','Premium','Very Good')) %>%
ggplot(aes(carat)) +
geom_density() +
facet_grid(cut ~ .)
Should give this result (as of ggplot 3.3.0):
Stacked Histograms Using R Base Graphics
You can generate both plots with barplot()
, based on a frequency table of Species
and Sepal.Length
.
# Create frequency table
tab <- table(iris$Species, iris$Sepal.Length)
# Stacked barplot
barplot(tab)
# Stacked percent barplot
barplot(prop.table(tab, 2)) # Need to convert to marginal table first
Plot staggered histograms/lines as in FACS
Is this the sort of thing you want?
What I did was define the y-distance between the baselines of each curve. For the ith curve, I calculated the minimum Y-value, then set that minimum to be i times the y-distance, adjusting the height of the entire curve accordingly. I used a decreasing z-order to ensure that the filled part of the curves were not obscured by the baselines.
Here's the code:
import numpy as np
import matplotlib.pyplot as plt
delta_Y = .5
zorder = 0
for i, Y in enumerate(data):
baseline = min(Y)
#change needed for minimum of Y to be delta_Y above previous curve
y_change = delta_Y * i - baseline
Y = Y + y_change
plt.fill_between(np.linspace(0, 1000, 1000), Y, np.ones(1000) * delta_Y * i, zorder = zorder)
zorder -= 1
Code that generates dummy data:
def gauss(X):
return np.exp(-X**2 / 2.0)
#create data
X = np.linspace(-10, 10, 100)
data = []
for i in xrange(10):
arr = np.zeros(1000)
arr[i * 100: i * 100 + 100] = gauss(X)
data.append(arr)
data.reverse()
how to mimic histogram plot from flowjo in R using flowCore?
The reason that for the "shift" is that the x axis is logarithmic (base 10) in the flowJo graph. To achieve the same result in R, add
+ scale_x_log10()
after the existing code. This might interact weirdly with the axis limits you've set, so bare that in mind.
To make the y-axis "count" rather than density, you can change the first line of your ggcyto() call to:
aes(x= `UV-379-A`, y = after_stat(count))
Let me know if that works - I don't have your data to hand so that's all from memory!
For any purely aesthetic changes, they are relatively easy to look up.
Histograms and Density Plots do not match up
While there is no data sample to reproduce the error, you could try to
make sure that the environment used by geom_density
is correct by specifying it explicitly. You can also try to move the code line specifying the density (geom_density
) just after the geom_histogram
. Also, the y-axis label is probably wrong - it is now set as counts, while values suggest that is in fact density.
How would I specify density explicitly?
You can specify the density parameters explicitly by specifying data
, aes
and position
directly in geom_density
function call, so it would use these stated instead of inherited arguments:
ggplot() +
geom_histogram(data=df.half, aes(x=time,y=..density..),position="identity", alpha=0.5,binwidth=1)+
geom_density(data=df.half,aes(x=time,y=..density..))+
geom_vline(data=sumy.df.half,aes(xintercept=grp.mean),color="blue", linetype="dashed", size=1)+
facet_grid(SUB_NUMBER ~ .)
I do not understand how it occured in the first place
I think in your initial code for geom_density
, you have explicitly specified just the alpha
argument. Thus for all of the rest of the parameters it needed, (data
, aes
, position
etc) it used the inherited arguments/parameters and apparently it did not inherit them correctly. Probably it tried to use the data argument from the geom_vline
function - sumy.df.half
, or was confused by the syntaxis in argument "..density.."
Related Topics
Index Element from List in Rcpp
Using Mean with .Sd and .Sdcols in Data.Table
How to Reset All Options() Arguments to Their Default Values
Add Missing Value in Column with Value from Row Above
How to Skip Error Checking at Rmarkdown Compiling
Names' Attribute Must Be the Same Length as the Vector
Align Axis Label on the Right with Ggplot2
Match.Call with Default Arguments
Logistic Regression with Robust Clustered Standard Errors in R
R - How to Add Row Index to a Data Frame, Based on Combination of Factors
How to Plot Mean and Standard Error in Boxplot in R
Remove Duplicate Values Based on 2 Columns
How to Find the First and Last Occurrences of an Element in a Data.Frame