One-Class Classification with Svm in R

One-class classification with SVM in R

I think this is what you want:

library(e1071)
data(iris)
df <- iris

df <- subset(df ,  Species=='setosa')  #choose only one of the classes

x <- subset(df, select = -Species) #make x variables
y <- df$Species #make y variable(dependent)
model <- svm(x, y,type='one-classification') #train an one-classification model 

print(model)
summary(model) #print summary

# test on the whole set
pred <- predict(model, subset(iris, select=-Species)) #create predictions

Output:

-Summary:

> summary(model)

Call:
svm.default(x = x, y = y, type = "one-classification")

Parameters:
   SVM-Type:  one-classification 
 SVM-Kernel:  radial 
      gamma:  0.25 
         nu:  0.5 

Number of Support Vectors:  27

Number of Classes: 1

-Predictions (only some of the predictions are shown here (where Species=='setosa') for visual reason):

> pred
    1     2     3     4     5     6     7     8     9    10    11    12    13    14    15    16    17    18    19    20    21    22 
 TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE 
   23    24    25    26    27    28    29    30    31    32    33    34    35    36    37    38    39    40    41    42    43    44 
FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE 
   45    46    47    48    49    50 
FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

One Class Classification in R language. What am I doing wrong when generating the confusion matrix?

I see a number of issues. First it seems that a lot of your data is of class character rather than numeric, which is required by the classifier. Let's pick some columns and convert to numeric. I will use data.table because fread is very convenient.

library(caret)
library(e1071)
library(data.table)
setDT(ds)
#Choose columns
mycols <- c("id","bp","sg","al","su")
#Convert to numeric
ds[,(mycols) := lapply(.SD, as.numeric),.SDcols = mycols]

#Convert classification to logical
data <- ds[,.(bp,sg,al,su,classification = ds$classification == "ckd")]
data
     bp    sg al su classification
  1: 80 1.020  1  0           TRUE
  2: 50 1.020  4  0           TRUE
  3: 80 1.010  2  3           TRUE
  4: 70 1.005  4  0           TRUE
  5: 80 1.010  2  0           TRUE
 ---                              
396: 80 1.020  0  0          FALSE
397: 70 1.025  0  0          FALSE
398: 80 1.020  0  0          FALSE
399: 60 1.025  0  0          FALSE
400: 80 1.025  0  0          FALSE

Once the data is cleaned up, you can sample a training and test set with createDataPartition as in your original code.

#Sample data for training and test set
inTrain<-createDataPartition(1:nrow(data),p=0.6,list=FALSE)
train<- data[inTrain,]
test <- data[-inTrain,]

Then we can create the model and make the predictions.

svm.model<-svm(classification ~ bp + sg + al + su, data = train,
               type='one-classification',
               nu=0.10,
               scale=TRUE,
               kernel="radial")

#Perform predictions 
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)

Your main issue with the cross table was that the model can only predict for cases that don't have any NAs, so you have to subset the classification levels to those with predictions. Then you can evaluate confusionMatrix:

confTrain <- table(Predicted=svm.predtrain,
                   Reference=train$classification[as.integer(names(svm.predtrain))])
confTest <- table(Predicted=svm.predtest,
                  Reference=test$classification[as.integer(names(svm.predtest))])

confusionMatrix(confTest,positive='TRUE')

Confusion Matrix and Statistics

         Reference
Predicted FALSE TRUE
    FALSE     0   17
    TRUE     55   64

               Accuracy : 0.4706         
                 95% CI : (0.3845, 0.558)
    No Information Rate : 0.5956         
    P-Value [Acc > NIR] : 0.9988         

                  Kappa : -0.2361        

 Mcnemar's Test P-Value : 1.298e-05      

            Sensitivity : 0.7901         
            Specificity : 0.0000         
         Pos Pred Value : 0.5378         
         Neg Pred Value : 0.0000         
             Prevalence : 0.5956         
         Detection Rate : 0.4706         
   Detection Prevalence : 0.8750         
      Balanced Accuracy : 0.3951         

       'Positive' Class : TRUE

Data

library(archive)
library(data.table)
tf1 <- tempfile(fileext = ".rar")
#Download data file
download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00336/Chronic_Kidney_Disease.rar", tf1)
tf2 <- tempfile()
#Un-rar file
archive_extract(tf1, tf2)
#Read in data
ds <- fread(paste0(tf2,"/Chronic_Kidney_Disease/chronic_kidney_disease.arff"), fill = TRUE, skip = "48")
#Remove erroneous last column
ds[,V26:= NULL]
#Set column names (from header)
setnames(ds,c("id","bp","sg","al","su","rbc","pc","pcc","ba","bgr","bu","sc","sod","pot","hemo","pcv","wc","rc","htn","dm","cad","appet","pe","ane","classification"))
#Replace "?" with NA
ds[ds == "?"] <- NA

Which algorithm does R use for computing one-class SVM ? (package e1071)

You can see the following link:
https://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf

The link shows the dual problem formulation of the SVM algorithm this package uses (when one use one-class SVM, page 7 index (3)), easy transformation from the dual to the primal problem shows that this default implementation is the one Schölkopf suggested, see paper:
https://www.stat.purdue.edu/~yuzhu/stat598m3/Papers/NewSVM.pdf

One-Class Classification with Svm in R