## Calculating optimal number of clusters with Nbclust()

Simple means of determining number of clusters is to examine the *elbow* in the plot of within groups sum of squares and/or average width of the silhouette, the code produces simple plots to examine these...

In order to perform clustering, you need to solve the problem of `NaN`

s after scaling...

`WKA_ohneJB_scaled <- as.matrix(scale(data[, c(-1, -2, -18)]))`

plot_scree_clusters <- function(x) {

wss <- 0

max_i <- 10 # max clusters

for (i in 1:max_i) {

km.model <- kmeans(x, centers = i, nstart = 20)

wss[i] <- km.model$tot.withinss

}

plot(1:max_i, wss, type = "b",

xlab = "Number of Clusters",

ylab = "Within groups sum of squares")

}

plot_scree_clusters(WKA_ohneJB_scaled)

plot_sil_width <- function(x) {

sw <- 0

max_i <- 10 # max clusters

for (i in 2:max_i) {

km.model <- cluster::pam(x = pc_comp$x, k = i)

sw[i] <- km.model$silinfo$avg.width

}

sw <- sw[-1]

plot(2:max_i, sw, type = "b",

xlab = "Number of Clusters",

ylab = "Average silhouette width")

}

plot_sil_width(WKA_ohneJB_scaled)

## Hierarchical Clustering: Determine optimal number of cluster and statistically describe Clusters

This is a very late answer and probably not useful for the asker anymore - but maybe for others. Check out the package NbClust. It contains 26 indices that give you a recommended number of clusters (and you can also choose your type of clustering). You can run it in such a way that you get the results for all the indices and then you can basically go with the number of clusters recommended by most indices. And yes, I think the basic statistics are the best way to describe clusters.

## How to get the optimal number of clusters from the clusGap function as an output?

Typically such information is somewhere directly inside the object, like `gap_stat$nc`

. To look for it `str(gap_stat)`

would typically suffice.

In this case, however, the above strategy isn't enough. But the fact that you can see your number of interest in the output, means that `print.clusGap`

(because the class of `gap_stat`

is clusGap) will show how to obtain this number. So, inspecting `cluster:::print.clusGap`

leads to

`maxSE(f = gap_stat$Tab[, "gap"], SE.f = gap_stat$Tab[, "SE.sim"])`

# [1] 1

### Related Topics

Ggplot2 Stacked Bar Chart - Each Bar Being 100% and With Percenage Labels Inside Each Bar

Multiplying All Columns in Dataframe by Single Column

Use First Row Data as Column Names in R

R: Error in Usemethod("Tbl_Vars")

How to Control Ordering of Stacked Bar Chart Using Identity on Ggplot2

Delete Rows Containing Specific Strings in R

Creating a Boxplot for Each Column in R

How to Convert a Data Frame Column to Numeric Type

Delete Rows With Negative Values

How to Replace Negative Values in a Dataframe Column With a Different Value

Select Every Nth Row from Dataframe

Adding Some Space Between the X-Axis and the Bars, in Ggplot

Too Much White Space Between Caption and Figure Produced by Tikzdevice and Ggplot2 in Latex

How to Select Variables in an R Dataframe Whose Names Contain a Particular String

Calculate Difference Between Values in Consecutive Rows by Group

R: Rjava Package Install Failing