Clustering List for Hclust Function

Clustering list for hclust function

I will use the dataset available in R to demonstrate how to cut a tree into desired number of pieces. Result is a table.

Construct a hclust object.

hc <- hclust(dist(USArrests), "ave")
#plot(hc)

You can now cut the tree into as many branches as you want. For my next trick, I will split the tree into two groups. You set the number of cuts with the k parameter. See ?cutree and the use of paramter h which may be more useful to you (see cutree(hc, k = 2) == cutree(hc, h = 110)).

cutree(hc, k = 2)
Alabama Alaska Arizona Arkansas California
1 1 1 2 1
Colorado Connecticut Delaware Florida Georgia
2 2 1 1 2
Hawaii Idaho Illinois Indiana Iowa
2 2 1 2 2
Kansas Kentucky Louisiana Maine Maryland
2 2 1 2 1
Massachusetts Michigan Minnesota Mississippi Missouri
2 1 2 1 2
Montana Nebraska Nevada New Hampshire New Jersey
2 2 1 2 2
New Mexico New York North Carolina North Dakota Ohio
1 1 1 2 2
Oklahoma Oregon Pennsylvania Rhode Island South Carolina
2 2 2 2 1
South Dakota Tennessee Texas Utah Vermont
2 2 2 2 2
Virginia Washington West Virginia Wisconsin Wyoming
2 2 2 2 2

Hierarchical cluster analysis help - dendrogram

You chose to perform hierarchical clustering using average method.

According to ?hclust:

This function performs a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered. Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster. At each stage distances between clusters are recomputed

You can follow what happens using the merge field:

Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation −j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm

fit.average$merge
[,1] [,2]
[1,] -21 -22
[2,] -15 1
[3,] -13 -24
[4,] -6 -20
[5,] -2 -23
[6,] -16 -27
...

This is what you see in the dendogram:

Sample Image

The height on the y-axis of the dendogram represents the distance between a point and the center of the cluster it's associated to (because you use method average).

  1. points 21 and 22 (which are the nearest) are merged together creating cluster 1 with their barycenter
  2. cluster 1 is merged with point 15 creating cluster 2
  3. ...

You could then call rect.clust which allows various arguments, like the number of groups k you'd like:

rect.hclust(fit.average, k=3)

Sample Image

You can also use output of rect.clust to color the original points:

groups <- rect.hclust(fit.average, k=3)
groups

#[[1]]
# [1] 5 6 7 8 9 10 17 18 19 20

#[[2]]
# [1] 1 2 3 4 15 21 22 23

#[[3]]
# [1] 11 12 13 14 16 24 25 26 27 28 29

colors <- rep(1:length(groups),lengths(groups))
colors <- colors[order(unlist(groups))]

plot(coordinates[,2:1],col = colors)

Sample Image

Error in R when using cluster analysis (hclust function)

In ?hclust the d argument is described as:

d

a dissimilarity structure as produced by dist.

The object matrix is not such an object. In fact it is not even an R matrix. It is a data frame.

Try the following. We have given a more accurate name to the input and then converted it to a dist object as required.

DF <- data.frame(n1 = c(0,1,11,5), n2 = c(1,0,2,3), n3 = c(11,2,0,4), n4 = c(5,3,4,0))
hclust(as.dist(DF), "ave")


Related Topics



Leave a reply



Submit