Differences in Heatmap/Clustering Defaults in R (Heatplot Versus Heatmap.2)

Understanding heatmap dendogram clustering in R

Rowv and Colv control whether the rows and columns of your data set should be reordered and if so how.

The possible values for them are TRUE, NULL, FALSE, a vector of integers, or a dendrogram object.

  • In the default mode TRUE, heatmap.2 performs clustering using the hclustfun and distfun parameters. This defaults to complete linkage clustering, using a euclidean distance measure. The dendrogram is then reordered using the row/column means. You can control this by specifying different functions to hclustfun or distfun. For example to use the Manhattan distance rather than the euclidiean distance you would do:

    heatmap.2(x,...,distfun=function (y) dist(y,method = "manhattan") )

    check out ?dist and ?hclust. If you want to learn more about clustering you could start with "distance measures" and "agglomeration methods".

  • If Rowv/Colv is NULL or FALSE then no reordering or clustering is done and the matrix is plotted as-is.

  • If Rowv/Colv is a numeric vector, then the clustering is computed as for TRUE and the reordering of the dendrogram is done using the vector supplied to Rowv/Colv.

  • If Rowv/Colv is a dendrogram object, then this dendrogram will be used to reorder the matrix. Dendrogram objects can be generated, for example, by:

    rowDistance = dist(x, method = "manhattan")
    rowCluster = hclust(rowDistance, method = "complete")
    rowDend = as.dendrogram(rowCluster)
    rowDend = reorder(rowDend, rowMeans(x))

    which generates a complete clustering on a manhattan distance, ordered by row means. You can now pass rowDend to Rowv.

    heatmap.2(x,...,Rowv = rowDend)

    This can be useful, if for example you want to cluster the rows and columns in different ways, or use a clustering that someone else has given you, or you want to do something funky that cannot be accommodated by just specifying the hclustfun and the distfun. This is what is meant by" the dendrogram is honoured": it is used instead of what is specified by hclustfun and distfun.

How do I change the clustering algorithm for heatmap.2 function in R?

You are really close. As hclustfun needs to be a function, the parameter value should indeed be a function, while hclust(method = "average") is calling hclust without specifying d. Meanwhile,

heatmap.2(x, hclustfun = function(d) hclust(d, method = "average"))

works.

How to expand the dendogram in heatmap.2

In your case the data has long tail, which is expected for gene expression data (lognormal).

data <- read.table(file='http://pastebin.com/raw.php?i=ZaGkPTGm', 
header=TRUE, row.names=1)

mat <- as.matrix(data[,-1]) # -1 removes the first column containing gene symbols

As you can see from the quantile distribution that the genes with the highest expression extend the range from 1.5 to above 300.

quantile(mat)

# 0% 25% 50% 75% 100%
# 0.000 0.769 1.079 1.544 346.230

When the hierarchical clustering is performed on unscaled data the resulting dendrogram may show bias towards the values with the highest expression, as seen in your example. This merits either a logarithmic or z-score transformation, among many (reference). Your dataset contains values == 0, which is a problem for log-transformation since log(0) is undefined.

Z-score transformation (reference) is implemented within heatmap.2, but it's important to note that the function computes the distance matrix and runs clustering algorithm before scaling the data. Hence the option scale='row' doesn't influence the clustering results, see my earlier post (differences in heatmap/clustering defaults in R) for more details.

I would propose that you scale your data before running heatmap.2:

# scale function transforms columns by default hence the need for transposition.
z <- t(scale(t(mat)))

quantile(z)

# 0% 25% 50% 75% 100%
# -2.1843994 -0.6646909 -0.2239677 0.3440102 2.2640027

# set custom distance and clustering functions
hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) dist(x,method="maximum")

# obtain the clusters
fit <- hclustfunc(distfunc(z))
clusters <- cutree(fit, 5)

# require(gplots)
pdf(file='heatmap.pdf', height=50, width=10)
heatmap.2(z, trace='none', dendrogram='row', Colv=F, scale='none',
hclust=hclustfunc, distfun=distfunc, col=greenred(256), symbreak=T,
margins=c(10,20), keysize=0.5, labRow=data$Gene.symbol,
lwid=c(1,0.05,1), lhei=c(0.03,1), lmat=rbind(c(5,0,4),c(3,1,2)),
RowSideColors=as.character(clusters))
dev.off()

Also, see the additional posts here and here, which explain how to set the layout of the heatmap via lmat, lwid and lhei parameters.

The resulting heatmap is shown below (row and column labels are omitted):

Sample Image

How to use WeightedCluster::wcKMedoids to provide clustering for heatmap or heatmap.2 in R?

This cannot be done. K-Medoid clustering is a partioning method, not a hierarchical one. Dendogram is only meaningful for hierarchical clustering algorithms.



Related Topics



Leave a reply



Submit