Calculating optimal number of clusters with Nbclust()
Simple means of determining number of clusters is to examine the elbow in the plot of within groups sum of squares and/or average width of the silhouette, the code produces simple plots to examine these...
In order to perform clustering, you need to solve the problem of NaN
s after scaling...
WKA_ohneJB_scaled <- as.matrix(scale(data[, c(-1, -2, -18)]))
plot_scree_clusters <- function(x) {
wss <- 0
max_i <- 10 # max clusters
for (i in 1:max_i) {
km.model <- kmeans(x, centers = i, nstart = 20)
wss[i] <- km.model$tot.withinss
}
plot(1:max_i, wss, type = "b",
xlab = "Number of Clusters",
ylab = "Within groups sum of squares")
}
plot_scree_clusters(WKA_ohneJB_scaled)
plot_sil_width <- function(x) {
sw <- 0
max_i <- 10 # max clusters
for (i in 2:max_i) {
km.model <- cluster::pam(x = pc_comp$x, k = i)
sw[i] <- km.model$silinfo$avg.width
}
sw <- sw[-1]
plot(2:max_i, sw, type = "b",
xlab = "Number of Clusters",
ylab = "Average silhouette width")
}
plot_sil_width(WKA_ohneJB_scaled)
Hierarchical Clustering: Determine optimal number of cluster and statistically describe Clusters
This is a very late answer and probably not useful for the asker anymore - but maybe for others. Check out the package NbClust. It contains 26 indices that give you a recommended number of clusters (and you can also choose your type of clustering). You can run it in such a way that you get the results for all the indices and then you can basically go with the number of clusters recommended by most indices. And yes, I think the basic statistics are the best way to describe clusters.
How to get the optimal number of clusters from the clusGap function as an output?
Typically such information is somewhere directly inside the object, like gap_stat$nc
. To look for it str(gap_stat)
would typically suffice.
In this case, however, the above strategy isn't enough. But the fact that you can see your number of interest in the output, means that print.clusGap
(because the class of gap_stat
is clusGap) will show how to obtain this number. So, inspecting cluster:::print.clusGap
leads to
maxSE(f = gap_stat$Tab[, "gap"], SE.f = gap_stat$Tab[, "SE.sim"])
# [1] 1
Related Topics
Ggplot2 Stacked Bar Chart - Each Bar Being 100% and With Percenage Labels Inside Each Bar
Multiplying All Columns in Dataframe by Single Column
Use First Row Data as Column Names in R
R: Error in Usemethod("Tbl_Vars")
How to Control Ordering of Stacked Bar Chart Using Identity on Ggplot2
Delete Rows Containing Specific Strings in R
Creating a Boxplot for Each Column in R
How to Convert a Data Frame Column to Numeric Type
Delete Rows With Negative Values
How to Replace Negative Values in a Dataframe Column With a Different Value
Select Every Nth Row from Dataframe
Adding Some Space Between the X-Axis and the Bars, in Ggplot
Too Much White Space Between Caption and Figure Produced by Tikzdevice and Ggplot2 in Latex
How to Select Variables in an R Dataframe Whose Names Contain a Particular String
Calculate Difference Between Values in Consecutive Rows by Group
R: Rjava Package Install Failing