One-Class Classification with Svm in R

One-class classification with SVM in R

I think this is what you want:

library(e1071)
data(iris)
df <- iris

df <- subset(df , Species=='setosa') #choose only one of the classes

x <- subset(df, select = -Species) #make x variables
y <- df$Species #make y variable(dependent)
model <- svm(x, y,type='one-classification') #train an one-classification model

print(model)
summary(model) #print summary

# test on the whole set
pred <- predict(model, subset(iris, select=-Species)) #create predictions

Output:

-Summary:

> summary(model)

Call:
svm.default(x = x, y = y, type = "one-classification")

Parameters:
SVM-Type: one-classification
SVM-Kernel: radial
gamma: 0.25
nu: 0.5

Number of Support Vectors: 27

Number of Classes: 1

-Predictions (only some of the predictions are shown here (where Species=='setosa') for visual reason):

> pred
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
45 46 47 48 49 50
FALSE TRUE TRUE TRUE TRUE TRUE

One Class Classification in R language. What am I doing wrong when generating the confusion matrix?

I see a number of issues. First it seems that a lot of your data is of class character rather than numeric, which is required by the classifier. Let's pick some columns and convert to numeric. I will use data.table because fread is very convenient.

library(caret)
library(e1071)
library(data.table)
setDT(ds)
#Choose columns
mycols <- c("id","bp","sg","al","su")
#Convert to numeric
ds[,(mycols) := lapply(.SD, as.numeric),.SDcols = mycols]

#Convert classification to logical
data <- ds[,.(bp,sg,al,su,classification = ds$classification == "ckd")]
data
bp sg al su classification
1: 80 1.020 1 0 TRUE
2: 50 1.020 4 0 TRUE
3: 80 1.010 2 3 TRUE
4: 70 1.005 4 0 TRUE
5: 80 1.010 2 0 TRUE
---
396: 80 1.020 0 0 FALSE
397: 70 1.025 0 0 FALSE
398: 80 1.020 0 0 FALSE
399: 60 1.025 0 0 FALSE
400: 80 1.025 0 0 FALSE

Once the data is cleaned up, you can sample a training and test set with createDataPartition as in your original code.

#Sample data for training and test set
inTrain<-createDataPartition(1:nrow(data),p=0.6,list=FALSE)
train<- data[inTrain,]
test <- data[-inTrain,]

Then we can create the model and make the predictions.

svm.model<-svm(classification ~ bp + sg + al + su, data = train,
type='one-classification',
nu=0.10,
scale=TRUE,
kernel="radial")

#Perform predictions
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)

Your main issue with the cross table was that the model can only predict for cases that don't have any NAs, so you have to subset the classification levels to those with predictions. Then you can evaluate confusionMatrix:

confTrain <- table(Predicted=svm.predtrain,
Reference=train$classification[as.integer(names(svm.predtrain))])
confTest <- table(Predicted=svm.predtest,
Reference=test$classification[as.integer(names(svm.predtest))])

confusionMatrix(confTest,positive='TRUE')

Confusion Matrix and Statistics

Reference
Predicted FALSE TRUE
FALSE 0 17
TRUE 55 64

Accuracy : 0.4706
95% CI : (0.3845, 0.558)
No Information Rate : 0.5956
P-Value [Acc > NIR] : 0.9988

Kappa : -0.2361

Mcnemar's Test P-Value : 1.298e-05

Sensitivity : 0.7901
Specificity : 0.0000
Pos Pred Value : 0.5378
Neg Pred Value : 0.0000
Prevalence : 0.5956
Detection Rate : 0.4706
Detection Prevalence : 0.8750
Balanced Accuracy : 0.3951

'Positive' Class : TRUE

Data

library(archive)
library(data.table)
tf1 <- tempfile(fileext = ".rar")
#Download data file
download.file("http://archive.ics.uci.edu/ml/machine-learning-databases/00336/Chronic_Kidney_Disease.rar", tf1)
tf2 <- tempfile()
#Un-rar file
archive_extract(tf1, tf2)
#Read in data
ds <- fread(paste0(tf2,"/Chronic_Kidney_Disease/chronic_kidney_disease.arff"), fill = TRUE, skip = "48")
#Remove erroneous last column
ds[,V26:= NULL]
#Set column names (from header)
setnames(ds,c("id","bp","sg","al","su","rbc","pc","pcc","ba","bgr","bu","sc","sod","pot","hemo","pcv","wc","rc","htn","dm","cad","appet","pe","ane","classification"))
#Replace "?" with NA
ds[ds == "?"] <- NA

Which algorithm does R use for computing one-class SVM ? (package e1071)

You can see the following link:
https://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf

The link shows the dual problem formulation of the SVM algorithm this package uses (when one use one-class SVM, page 7 index (3)), easy transformation from the dual to the primal problem shows that this default implementation is the one Schölkopf suggested, see paper:
https://www.stat.purdue.edu/~yuzhu/stat598m3/Papers/NewSVM.pdf



Related Topics



Leave a reply



Submit