Error with Knn Function

Error with knn function

I suspect that your issue lies in having non-numeric data fields in 'mydades'. The error line:

NA/NaN/Inf in foreign function call (arg 6)

makes me suspect that the knn-function call to the C language implementation fails. Many functions in R actually call underlying, more efficient C implementations, instead of having an algorithm implemented in just R. If you type just 'knn' in your R console, you can inspect the R implementation of 'knn'. There exists the following line:

 Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr), 
as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)),
as.double(test), res = integer(nte), pr = double(nte),
integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))

where .C means that we're calling a C function named 'VR_knn' with the provided function arguments. Since you have two of the errors

NAs introduced by coercion

I think two of the as.double/as.integer calls fail, and introduce NA values. If we start counting the parameters, the 6th argument is:

as.double(train)

that may fail in cases such as:

# as.double can not translate text fields to doubles, they are coerced to NA-values:
> as.double("sometext")
[1] NA
Warning message:
NAs introduced by coercion
# while the following text is cast to double without an error:
> as.double("1.23")
[1] 1.23

You get two of the coercion errors, which are probably given by 'as.double(train)' and 'as.double(test)'. Since you did not provide us with exact details of how 'mydades' is, here are some of my best guesses (and an artificial multivariate normal distribution data):

library(MASS)
mydades <- mvrnorm(100, mu=c(1:6), Sigma=matrix(1:36, ncol=6))
mydades <- cbind(mydades, sample(LETTERS[1:5], 100, replace=TRUE))

# This breaks knn
mydades[3,4] <- Inf
# This breaks knn
mydades[4,3] <- -Inf
# These, however, do not introduce the coercion for NA-values error message

# This breaks knn and gives the same error; just some raw text
mydades[1,2] <- mydades[50,1] <- "foo"
mydades[100,3] <- "bar"

# ... or perhaps wrongly formatted exponential numbers?
mydades[1,1] <- "2.34EXP-05"

# ... or wrong decimal symbol?
mydades[3,3] <- "1,23"
# should be 1.23, as R uses '.' as decimal symbol and not ','

# ... or most likely a whole column is non-numeric, since the error is given twice (as.double problem both in training AND test set)
mydades[,1] <- sample(letters[1:5],100,replace=TRUE)

I would not keep both the numeric data and class labels in a single matrix, perhaps you could split the data as:

mydadesnumeric <- mydades[,1:6] # 6 first columns
mydadesclasses <- mydades[,7]

Using calls

str(mydades); summary(mydades)

may also help you/us in locating the problematic data entries and correct them to numeric entries or omitting non-numeric fields.

The rest of the run code (after breaking the data), as provided by you:

N <- nrow(mydades) 
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]

# 7th column seems to be the class labels
knn(train=mydades.training[,-7],test=mydades.test[,-7],mydades.training[,7],k=5)

Why R is throwing error(could not find function knn ) while passing a command (knn.pred=knn(train.X,test.X,train.Y,k=1)) during my learning?

Use the class library

knn function is part of the class library:

library(class)

Complete code:

library(ISLR)
library(class)
standardized.X = scale(Caravan [, -86])
test = 1:1000
train.X = standardized.X[-test , ]
test.X = standardized.X[test , ]
attach(Caravan)
train.Y = Purchase [-test]
test.Y = Purchase [test]
set.seed(1)
knn.pred = knn(train.X, test.X, train.Y, k = 1)

Output:

> knn.pred
[1] No No No No No No Yes No Yes No No No Yes No No No No Yes Yes No No
[22] No Yes No No No No No No No No No No No No No No No No No No No
...

Hope this helps.



Related Topics



Leave a reply



Submit