Weighted Kmeans R

How do I perform weighted k-means clustering with normalized weights in R?

Integrating weight into k-means is trivial.

But I don't think the out-of-the-box versions in R support this, so you will have to write the code yourself. Beware that the R interpreter is really slow. The k-means function you are using is Fortran, and that is why it's so fast. flexcluster may support weights, but benchmark it to see how bad the performance is.

How to weight a set of variables for k-means or PAM clustering?

If you just need to generate the 140 weights so the first 6 variables are weighted higher, you can do:

weights = rep(c(1, 0.5), c(6, 134))

Weighting k Means Clustering by number of observations

Since SUMALL is the number of times a particular observation occurred, you could create a new dataset where each row is replicated the correct number of times, and then do your clustering with that new dataset.

Here's a simple example of expanding the dataset for replicate rows

df<-data.frame(a=c(1,2,3,4),b=c(4,5,6,7),c=c(7,8,9,9),SUMALL=c(2,6,4,1))
  a b c SUMALL
1 1 4 7      2
2 2 5 8      6
3 3 6 9      4
4 4 7 9      1

Then we need to expand df by replicating rows according to SUMALL

df_expanded<-df[rep(seq_len(nrow(df)),df$SUMALL),]

a b c SUMALL
1   1 4 7      2
1.1 1 4 7      2
2   2 5 8      6
2.1 2 5 8      6
2.2 2 5 8      6
2.3 2 5 8      6
2.4 2 5 8      6
2.5 2 5 8      6
3   3 6 9      4
3.1 3 6 9      4
3.2 3 6 9      4
3.3 3 6 9      4
4   4 7 9      1

Then use that with your favorite clustering method.

Is there a way to determine the weight of different attributes used for R clustering?

You could use FeatureImpCluster:

library(FeatureImpCluster)
library(clustMixType)
data <- as.data.table(iris)
res <- kproto(x=data,k=4)

FeatureImp_res <- FeatureImpCluster(res,data)
plot(FeatureImp_res,data,color="type")

Sample Image

Weighted Kmeans R

How do I perform weighted k-means clustering with normalized weights in R?

How to weight a set of variables for k-means or PAM clustering?

Weighting k Means Clustering by number of observations

Is there a way to determine the weight of different attributes used for R clustering?

Related Topics

Leave a reply