How do I manually create a dendrogram (or hclust) object ? (in R)
I think you are better of creating an hclust
object, and then converting it to a dendrogram using as.dendrogram
, then trying to create a dendrogram directly. Look at the ?hclust
help page to see the meaning of the elements of an hclust
object.
Here is a simple example with four leaves A, B, C, and D, combining first A-B, then C-D, and finally AB-CD:
a <- list() # initialize empty object
# define merging pattern:
# negative numbers are leaves,
# positive are merged clusters (defined by row number in $merge)
a$merge <- matrix(c(-1, -2,
-3, -4,
1, 2), nc=2, byrow=TRUE )
a$height <- c(1, 1.5, 3) # define merge heights
a$order <- 1:4 # order of leaves(trivial if hand-entered)
a$labels <- LETTERS[1:4] # labels of leaves
class(a) <- "hclust" # make it an hclust object
plot(a) # look at the result
#convert to a dendrogram object if needed
ad <- as.dendrogram(a)
Creating dendrograms manually: how to fix 'merge' matrix has invalid contents in plot.hclust?
The validity of a hclust
tree is checked by the .validity.hclust
function. Its source code is given here. Look at lines 121-135.
That you got the error means that your tree is not valid because of its merge
matrix. It has non-unique elements (e.g., 1 and 2). In a properly constructed merge
matrix, all entries are unique and run from -N_obs
to N_obs-2
(zero excluded), where N_obs
is a (positive) number of observations. This is checked by the following if
test in the code:
if(identical(sort(as.integer(merge)), c(-(n:1L), +seq_len(n-2L))))
TRUE
else
"'merge' matrix has invalid contents"
From the reference of hclust
:
merge an n − 1 by 2 matrix.
Row i of merge describes the merging of clusters at step i of the
clustering. If an element j in the row is negative, then observation
− j was merged at this stage. If j is positive then the merge was
with the cluster formed at the (earlier) stage j of the algorithm.
Thus negative entries in merge indicate agglomerations of singletons,
and positive entries indicate agglomerations of non-singletons.
All negative entries are singletons (observations), and positive numbers are merges of existing clusters and refer to merging steps of the algorithm.
So, revise your hclust
object. Here is some code to give you an idea what a proper hclust
object looks like:
iris2 <- iris[1:20,-5]
species_labels <- iris[,5]
d_iris <- dist(iris2)
tree_iris <- hclust(d_iris, method = "complete")
Take a closer look at tree_iris$merge
.
UPDATE
After I got more time, I decided to fix your code. I modified the merge
entry of the tree
. This is what the working code that reproduces your dendrogram looks like:
tree <- list()
tree$merge <- matrix(c( -1, -7, # row 1
-2, -6, # row 2
-3, -12, # row 3
-4, -14, # row 4
-5, -8, # row 5
-9, -11, # row 6
-13, -20, # row 7
-15, -19, # row 8
1, 8, # row 9: 1,7,15,19
2, 5, # row 10: 2,6,5,8
3, 6, # row 11: 3,12,9,11
10, -18, # row 12: 2,6,5,8 + 18
9, 11, # row 13: 1,7,15,19 + 3,12,9,11
12, 4, # row 14: row 12 + row 4
-10, 7, # row 15: row 7 + 10
-16, -17, # row 16
13, 14, # row 17: row 13 + row 14
15, 16, # row 18: row 15 + row 16
17, 18), # row 19: row 17 + row 18
ncol = 2,
byrow = TRUE)
tree$height <- c(0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.06573653, 0.11167131, 0.11167131, 0.11167131, 0.12832304, 0.17304035, 0.17304035, 0.17304035, 0.17304035, 0.22965349, 0.22965349, 0.23334799)
tree$labels <- as.character(1:20)
tree$order <- c(1, 7, 15, 19, 3, 12, 9, 11, 2, 6, 5, 8, 18, 4, 14, 13, 20, 10, 16, 17)
class(tree) <- "hclust"
plot(tree)
How can I create a dendrogram in R using pre-clustered data created elsewhere?
I think what you are looking for is phylog. You can print your tree in a file in Newick notation, parse that out and construct a phylog object which you can easily visualize. The end of the webpage gives an example of how to do this. You also might want to consider phylobase. Although you don't want the entire functionality provided by these packages, you can piggyback on the constructs they use to represent trees and their plotting capabilities.
EDIT: It looks like a similar question to yours has been asked before here providing a simpler solution. So basically the only thing you will have to code here is your Newick parser or a parser for any other representation you want to output from Java.
How to make R output text details about a dendrogram object?
You can compute this from the hclust
return with stats::cutree
cutree(hie_clust,k=2)
Related Topics
Calculate Rolling Correlation Using Rollapply
Earliest Date for Each Id in R
Arithmetic Mean on a Multidimensional Array on R and Matlab: Drastic Difference of Performances
Basic - T-Test -> Grouping Factor Must Have Exactly 2 Levels
Add Annotation and Segments to Groups of Legend Elements
Extract Column from Data.Frame as a Vector
R: Ggplot Display All Dates on X Axis
Change Default Prompt and Output Line Prefix in R
Ggplot2: Geom_Text Resize with the Plot and Force/Fit Text Within Geom_Bar
How to Filter a Range of Numbers in R
Read Fasta into a Dataframe and Extract Subsequences of Fasta File
Convert a Matrix with Dimnames into a Long Format Data.Frame
Automatic Adjustment of Margins in Horizontal Bar Chart