Plot Probability Heatmap/Hexbin with Different Sized Bins

Plot probability heatmap/hexbin with different sized bins

Edit

I think the following solution does what you ask for.

(Note that this is slow, especially the reshape step)

numbet <- 32
numtri <- 1e5
prob=5/6
#Fill a matrix
xcum <- matrix(NA, nrow=numtri, ncol=numbet+1)
for (i in 1:numtri) {
x <- sample(c(0,1), numbet, prob=c(prob, 1-prob), replace = TRUE)
xcum[i, ] <- c(i, cumsum(x)/cumsum(1:numbet))
}
colnames(xcum) <- c("trial", paste("bet", 1:numbet, sep=""))

mxcum <- reshape(data.frame(xcum), varying=1+1:numbet,
idvar="trial", v.names="outcome", direction="long", timevar="bet")

library(plyr)
mxcum2 <- ddply(mxcum, .(bet, outcome), nrow)
mxcum3 <- ddply(mxcum2, .(bet), summarize,
ymin=c(0, head(seq_along(V1)/length(V1), -1)),
ymax=seq_along(V1)/length(V1),
fill=(V1/sum(V1)))
head(mxcum3)

library(ggplot2)

p <- ggplot(mxcum3, aes(xmin=bet-0.5, xmax=bet+0.5, ymin=ymin, ymax=ymax)) +
geom_rect(aes(fill=fill), colour="grey80") +
scale_fill_gradient("Outcome", formatter="percent", low="red", high="blue") +
scale_y_continuous(formatter="percent") +
xlab("Bet")

print(p)

Sample Image

How do I change hexbin plot scales?

You cannot control the boundaries of the scale as closely as you want, but you can adjust it somewhat. First we need a reproducible example:

set.seed(42)
X <- rnorm(10000, 10, 3)
Y <- rnorm(10000, 10, 3)
XY.hex <- hexbin(X, Y)

To change the scale we need to specify a function to use on the counts and an inverse function to reverse the transformation. Now, three different scalings:

plot(XY.hex)    # Linear, default
plot(XY.hex, trans=sqrt, inv=function(x) x^2) # Square root
plot(XY.hex, trans=log, inv=function(x) exp(x)) # Log

The top plot is the original scaling. The bottom left is the square root transform and the bottom right is the log transform. There are probably too many levels to read these plots clearly. Adding the argument colorcut=6 to the plot command would reduce the number of levels to 5.

Three Plots

Hexbin: apply function for every bin

@cryo111's answer has the most important ingredient - IDs = TRUE. After that it's just a matter of figuring out what you want to do with Inf's and how much do you need to scale the ratios by to get integers that will produce a pretty plot.

library(hexbin)
library(data.table)

set.seed(1)
x = rnorm(10000)
y = rnorm(10000)

h = hexbin(x, y, IDs = TRUE)

# put all the relevant data in a data.table
dt = data.table(x, y, l = c(1,1,1,2), cID = h@cID)

# group by cID and calculate whatever statistic you like
# in this case, ratio of 1's to 2's,
# and then Inf's are set to be equal to the largest ratio
dt[, list(ratio = sum(l == 1)/sum(l == 2)), keyby = cID][,
ratio := ifelse(ratio == Inf, max(ratio[is.finite(ratio)]), ratio)][,
# scale up (I chose a scaling manually to get a prettier graph)
# and convert to integer and change h
as.integer(ratio*10)] -> h@count

plot(h)

Sample Image

Heatmap with different sized rectangles and colors

This type of visualization is called a treemap. There seem to be several R packages for creating them

Hexbin: how to trace bin contents

Assuming you are using the hexbin package, then you will need to set IDs=TRUE to be able to go back to the original rows

library(hexbin)
set.seed(5)
df <- data.frame(depth=runif(1000,min=0,max=100),temp=runif(1000,min=4,max=14))
h<-hexbin(df, IDs=TRUE)

Then to get the bin number for each observation, you can use

h@cID

To get the count of observations in the cell populated by a particular observation, you would do

h@count[match(h@cID, h@cell)]

The idea is that the second observation df[2,] is in cell h@cID[2]=424. Cell 424 is at index which(h@cell==424)=241 in the list of cells (zero count cells appear to be omitted). The number of observations in that cell is h@count[241]=2.



Related Topics



Leave a reply



Submit