How to Manually Create a Dendrogram (Or "Hclust") Object? (In R)

How do I manually create a dendrogram (or hclust) object ? (in R)

I think you are better of creating an hclust object, and then converting it to a dendrogram using as.dendrogram, then trying to create a dendrogram directly. Look at the ?hclust help page to see the meaning of the elements of an hclust object.

Here is a simple example with four leaves A, B, C, and D, combining first A-B, then C-D, and finally AB-CD:

a <- list()  # initialize empty object
# define merging pattern:
# negative numbers are leaves,
# positive are merged clusters (defined by row number in $merge)
a$merge <- matrix(c(-1, -2,
-3, -4,
1, 2), nc=2, byrow=TRUE )
a$height <- c(1, 1.5, 3) # define merge heights
a$order <- 1:4 # order of leaves(trivial if hand-entered)
a$labels <- LETTERS[1:4] # labels of leaves
class(a) <- "hclust" # make it an hclust object
plot(a) # look at the result

#convert to a dendrogram object if needed
ad <- as.dendrogram(a)

Creating dendrograms manually: how to fix 'merge' matrix has invalid contents in plot.hclust?

The validity of a hclust tree is checked by the .validity.hclust function. Its source code is given here. Look at lines 121-135.

That you got the error means that your tree is not valid because of its merge matrix. It has non-unique elements (e.g., 1 and 2). In a properly constructed merge matrix, all entries are unique and run from -N_obs to N_obs-2 (zero excluded), where N_obs is a (positive) number of observations. This is checked by the following if test in the code:

if(identical(sort(as.integer(merge)), c(-(n:1L), +seq_len(n-2L))))
TRUE
else
"'merge' matrix has invalid contents"

From the reference of hclust:

merge an n − 1 by 2 matrix.

Row i of merge describes the merging of clusters at step i of the
clustering. If an element j in the row is negative, then observation
− j was merged at this stage. If j is positive then the merge was
with the cluster formed at the (earlier) stage j of the algorithm.
Thus negative entries in merge indicate agglomerations of singletons,
and positive entries indicate agglomerations of non-singletons.

All negative entries are singletons (observations), and positive numbers are merges of existing clusters and refer to merging steps of the algorithm.

So, revise your hclust object. Here is some code to give you an idea what a proper hclust object looks like:

iris2 <- iris[1:20,-5]
species_labels <- iris[,5]
d_iris <- dist(iris2)
tree_iris <- hclust(d_iris, method = "complete")

Take a closer look at tree_iris$merge.

UPDATE

After I got more time, I decided to fix your code. I modified the merge entry of the tree. This is what the working code that reproduces your dendrogram looks like:

tree <- list()
tree$merge <- matrix(c( -1, -7, # row 1
-2, -6, # row 2
-3, -12, # row 3
-4, -14, # row 4
-5, -8, # row 5
-9, -11, # row 6
-13, -20, # row 7
-15, -19, # row 8
1, 8, # row 9: 1,7,15,19
2, 5, # row 10: 2,6,5,8
3, 6, # row 11: 3,12,9,11
10, -18, # row 12: 2,6,5,8 + 18
9, 11, # row 13: 1,7,15,19 + 3,12,9,11
12, 4, # row 14: row 12 + row 4
-10, 7, # row 15: row 7 + 10
-16, -17, # row 16
13, 14, # row 17: row 13 + row 14
15, 16, # row 18: row 15 + row 16
17, 18), # row 19: row 17 + row 18
ncol = 2,
byrow = TRUE)
tree$height <- c(0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.11167131, 0.11167131, 0.11167131, 0.12832304, 0.17304035, 0.17304035, 0.17304035, 0.17304035, 0.22965349, 0.22965349, 0.23334799)
tree$labels <- as.character(1:20)
tree$order <- c(1, 7, 15, 19, 3, 12, 9, 11, 2, 6, 5, 8, 18, 4, 14, 13, 20, 10, 16, 17)
class(tree) <- "hclust"
plot(tree)

How can I create a dendrogram in R using pre-clustered data created elsewhere?

I think what you are looking for is phylog. You can print your tree in a file in Newick notation, parse that out and construct a phylog object which you can easily visualize. The end of the webpage gives an example of how to do this. You also might want to consider phylobase. Although you don't want the entire functionality provided by these packages, you can piggyback on the constructs they use to represent trees and their plotting capabilities.

EDIT: It looks like a similar question to yours has been asked before here providing a simpler solution. So basically the only thing you will have to code here is your Newick parser or a parser for any other representation you want to output from Java.

How to make R output text details about a dendrogram object?

You can compute this from the hclust return with stats::cutree

cutree(hie_clust,k=2)


Related Topics



Leave a reply



Submit