How do I perform weighted k-means clustering with normalized weights in R?
Integrating weight into k-means is trivial.
But I don't think the out-of-the-box versions in R support this, so you will have to write the code yourself. Beware that the R interpreter is really slow. The k-means function you are using is Fortran, and that is why it's so fast. flexcluster
may support weights, but benchmark it to see how bad the performance is.
How to weight a set of variables for k-means or PAM clustering?
If you just need to generate the 140 weights so the first 6 variables are weighted higher, you can do:
weights = rep(c(1, 0.5), c(6, 134))
Weighting k Means Clustering by number of observations
Since SUMALL
is the number of times a particular observation occurred, you could create a new dataset where each row is replicated the correct number of times, and then do your clustering with that new dataset.
Here's a simple example of expanding the dataset for replicate rows
df<-data.frame(a=c(1,2,3,4),b=c(4,5,6,7),c=c(7,8,9,9),SUMALL=c(2,6,4,1))
a b c SUMALL
1 1 4 7 2
2 2 5 8 6
3 3 6 9 4
4 4 7 9 1
Then we need to expand df
by replicating rows according to SUMALL
df_expanded<-df[rep(seq_len(nrow(df)),df$SUMALL),]
a b c SUMALL
1 1 4 7 2
1.1 1 4 7 2
2 2 5 8 6
2.1 2 5 8 6
2.2 2 5 8 6
2.3 2 5 8 6
2.4 2 5 8 6
2.5 2 5 8 6
3 3 6 9 4
3.1 3 6 9 4
3.2 3 6 9 4
3.3 3 6 9 4
4 4 7 9 1
Then use that with your favorite clustering method.
Is there a way to determine the weight of different attributes used for R clustering?
You could use FeatureImpCluster:
library(FeatureImpCluster)
library(clustMixType)
data <- as.data.table(iris)
res <- kproto(x=data,k=4)
FeatureImp_res <- FeatureImpCluster(res,data)
plot(FeatureImp_res,data,color="type")
Related Topics
R-How to Generate Random Sample of a Discrete Random Variables
How to Save a Data Frame in a Txt or Excel File Separated by Columns
R- Plot Numbers Instead of Points
Why Is R Dplyr::Mutate Inconsistent with Custom Functions
Adding Shade to R Lineplot Denotes Standard Error
Mathematical Expression in Axis Label
Space Between Gpplot2 Horizontal Legend Elements
Pull Nth Day of Month in Xts in R
Why Doesn't "+" Operate on Characters in R
How to Determine If a Character Vector Is a Valid Numeric or Integer Vector
Add Font to R That Is Not in Extrafonts Library
Setting Working Directory: Julia Versus R
How to Do Str_Extract with Base R
Ess to Call Different Installations of R