Error with knn function
I suspect that your issue lies in having non-numeric data fields in 'mydades'. The error line:
NA/NaN/Inf in foreign function call (arg 6)
makes me suspect that the knn-function call to the C language implementation fails. Many functions in R actually call underlying, more efficient C implementations, instead of having an algorithm implemented in just R. If you type just 'knn' in your R console, you can inspect the R implementation of 'knn'. There exists the following line:
Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr),
as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)),
as.double(test), res = integer(nte), pr = double(nte),
integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))
where .C means that we're calling a C function named 'VR_knn' with the provided function arguments. Since you have two of the errors
NAs introduced by coercion
I think two of the as.double/as.integer calls fail, and introduce NA values. If we start counting the parameters, the 6th argument is:
as.double(train)
that may fail in cases such as:
# as.double can not translate text fields to doubles, they are coerced to NA-values:
> as.double("sometext")
[1] NA
Warning message:
NAs introduced by coercion
# while the following text is cast to double without an error:
> as.double("1.23")
[1] 1.23
You get two of the coercion errors, which are probably given by 'as.double(train)' and 'as.double(test)'. Since you did not provide us with exact details of how 'mydades' is, here are some of my best guesses (and an artificial multivariate normal distribution data):
library(MASS)
mydades <- mvrnorm(100, mu=c(1:6), Sigma=matrix(1:36, ncol=6))
mydades <- cbind(mydades, sample(LETTERS[1:5], 100, replace=TRUE))
# This breaks knn
mydades[3,4] <- Inf
# This breaks knn
mydades[4,3] <- -Inf
# These, however, do not introduce the coercion for NA-values error message
# This breaks knn and gives the same error; just some raw text
mydades[1,2] <- mydades[50,1] <- "foo"
mydades[100,3] <- "bar"
# ... or perhaps wrongly formatted exponential numbers?
mydades[1,1] <- "2.34EXP-05"
# ... or wrong decimal symbol?
mydades[3,3] <- "1,23"
# should be 1.23, as R uses '.' as decimal symbol and not ','
# ... or most likely a whole column is non-numeric, since the error is given twice (as.double problem both in training AND test set)
mydades[,1] <- sample(letters[1:5],100,replace=TRUE)
I would not keep both the numeric data and class labels in a single matrix, perhaps you could split the data as:
mydadesnumeric <- mydades[,1:6] # 6 first columns
mydadesclasses <- mydades[,7]
Using calls
str(mydades); summary(mydades)
may also help you/us in locating the problematic data entries and correct them to numeric entries or omitting non-numeric fields.
The rest of the run code (after breaking the data), as provided by you:
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
# 7th column seems to be the class labels
knn(train=mydades.training[,-7],test=mydades.test[,-7],mydades.training[,7],k=5)
Why R is throwing error(could not find function knn ) while passing a command (knn.pred=knn(train.X,test.X,train.Y,k=1)) during my learning?
Use the class library
knn
function is part of the class
library:
library(class)
Complete code:
library(ISLR)
library(class)
standardized.X = scale(Caravan [, -86])
test = 1:1000
train.X = standardized.X[-test , ]
test.X = standardized.X[test , ]
attach(Caravan)
train.Y = Purchase [-test]
test.Y = Purchase [test]
set.seed(1)
knn.pred = knn(train.X, test.X, train.Y, k = 1)
Output:
> knn.pred
[1] No No No No No No Yes No Yes No No No Yes No No No No Yes Yes No No
[22] No Yes No No No No No No No No No No No No No No No No No No No
...
Hope this helps.
Related Topics
R How to Convert a Numeric into Factor with Predefined Labels
Create Unique Identifier from the Interchangeable Combination of Two Variables
Link Selectinput with Sliderinput in Shiny
Sources on S4 Objects, Methods and Programming in R
Do I Need to Normalize (Or Scale) Data for Randomforest (R Package)
Does the Ternary Operator Exist in R
Ggplot: How to Increase Spacing Between Faceted Plots
Compare Two Character Vectors in R
Coding Practice in R:What Are the Advantages and Disadvantages of Different Styles
Avoid Wasting Space When Placing Multiple Aligned Plots Onto One Page
Recommended Package for Very Large Dataset Processing and MAChine Learning in R
Average Values of a Point Dataset to a Grid Dataset
Dependency 'Slam' Is Not Available When Installing Tm Package
Group by in R, Ddply with Weighted.Mean
How to Display the Median Value in a Boxplot in Ggplot