"Nas Introduced by Coercion" During Cluster Analysis in R

NAs introduced by coercion during Cluster Analysis in R

It's that first column that creates the issue:

> a <- c("1", "2",letters[1:5], "3")
> as.numeric(a)
[1] 1 2 NA NA NA NA NA 3
Warning message:
NAs introduced by coercion

Inside dist there must be a coercion to numeric, which generates the NA as above.

I'd suggestion to apply dist without the first column or better move that to rownames if possible, because the result will be different:

> dist(df)
1 2 3 4
2 1.8842186
3 1.9262360 1.2856110
4 3.2137871 1.7322788 2.9838920
5 1.3299455 0.9872963 1.9158079 1.8889050
Warning message:
In dist(df) : NAs introduced by coercion
> dist(df[-1])
1 2 3 4
2 1.538458
3 1.572765 1.049697
4 2.624046 1.414400 2.436338
5 1.085896 0.806124 1.564251 1.542284

btw: you don't need as.matrix when calling dist. It'll do that anyway internally.

EDIT: using rownames

rownames(df) <- df$id

> df
id var1 var2
A A -0.6264538 -0.8204684
B B 0.1836433 0.4874291
C C -0.8356286 0.7383247
D D 1.5952808 0.5757814
E E 0.3295078 -0.3053884

> dist(df[-1]) # you colud also remove the 1st col at all, using df$id <- NULL.
A B C D
B 1.538458
C 1.572765 1.049697
D 2.624046 1.414400 2.436338
E 1.085896 0.806124 1.564251 1.542284

Daisy function Warning Message: NAs introduced by coercion

Read the data in as factor variables instead of characters.

#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv",
na.strings = "", head = TRUE)

I had this solution in before and created an error.

#Load Data
Store4 <- read.csv("/Users/scdavis/Documents/Work/Data/Client4.csv",
na.strings = "", stringsAsFactors=FALSE, head = TRUE)

why do i get NAs introduced by coercion warning message?

As far as I know, yes and no do not equate to 0 and 1 in R. It would work with TRUE and FALSE however. You need to assign a value to "yes" and "no" directly.

cust.df$email<-factor(cust.df$email)
cust.df$email<-as.numeric(cust.df$email)

this will assign 1 and 2 to your data, if you want 0 and 1, then you can simply use:

cust.df$email[cust.df$email==2]<-0



Related Topics



Leave a reply



Submit