Why Are Probabilities and Response in Ksvm in R Not Consistent

Why are probabilities and response in ksvm in R not consistent?

If you look at the decicision matrix and votes, they seem to be more in line with the responses:

> predict(out, newdata = testdat, type = "response")
[1] 0 0 1 1
Levels: 0 1
> predict(out, newdata = testdat, type = "decision")
            [,1]
[1,] -0.07077917
[2,] -0.01762016
[3,]  0.02210974
[4,]  0.04762563
> predict(out, newdata = testdat, type = "votes")
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    0    0    1    1
> predict(out, newdata = testdat, type = "prob")
             0         1
[1,] 0.7198132 0.2801868
[2,] 0.6987129 0.3012871
[3,] 0.6823679 0.3176321
[4,] 0.6716249 0.3283751

The kernlab help pages (?predict.ksvm) link to paper Probability estimates for Multi-class Classification by Pairwise Coupling by T.F. Wu, C.J. Lin, and R.C. Weng.

In section 7.3 it is said that the decisions and probabilities can differ:

...We explain why the results by probability-based and
decision-value-based methods can be so distinct. For some problems,
the parameters selected by δDV are quite different from those by the
other five rules. In waveform, at some parameters all
probability-based methods gives much higher cross validation accuracy
than δDV . We observe, for example, the decision values of validation
sets are in [0.73, 0.97] and [0.93, 1.02] for data in two classes;
hence, all data in the validation sets are classified as in one class
and the error is high. On the contrary, the probability-based methods
fit the decision values by a sigmoid function, which can better
separate the two classes by cutting at a decision value around 0.95.
This observation shed some light on the difference between
probability-based and decision-value based methods...

I'm not familiar enough with these methods to understand the issue, but maybe you do, It looks like that there is distinct methods for predicting with probabilities and some other method, and the type=response corresponds to different method than the one which is used for prediction probabilities.

Kernlab kraziness: inconsistent results for identical problems

Reviewing some of the code, it appears that this is the offending line:

https://github.com/cran/kernlab/blob/efd7d91521b439a993efb49cf8e71b57fae5fc5a/src/svm.cpp#L4205

That is, in the case of a user-supplied kernel matrix, the ksvm is just looking at two dimensions, rather than whatever the dimensionality of the input is. This seems strange, and is probably a hold-over from some testing or whatever. Tests of the linear kernel with data of just two dimensions produces the same result: replace 1:4 with 1:2 in the above and the output and predictions all agree.

Different results for SVM using Caret in R when classProbs=TRUE

As noted in the comments by desertnaut, SVMs are not probabilistic classifiers; they do not actually produce probabilities.

One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce non-sparse kernel machines. Instead, after training a SVM, parameters of an additional sigmoid function are trained to map the SVM outputs into probabilities. Reference paper: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods

Caret method = "svmRadialSigma" uses internally kernlab::ksvm with the argument kernel = "rbfdot". In order for this function to create probabilities the argument prob.model = TRUE is needed. From the help of this function:

prob.model if set to TRUE builds a model for calculating class
probabilities or in case of regression, calculates the scaling
parameter of the Laplacian distribution fitted on the residuals.
Fitting is done on output data created by performing a 3-fold
cross-validation on the training data. For details see references.
(default: FALSE)

The refereed details:

In classification when prob.model is TRUE a 3-fold cross validation is
performed on the data and a sigmoid function is fitted on the
resulting decision values f.

It is clear that something very specific is happening for classification models when posterior probabilities are needed. This is different compared to just outputting decision values.

From this it can be derived that depending on the sigmoid function fit some of the
decision values can be different compared to when running [kernlab::ksvm] without prob.model (prob.model = FALSE) and this is what you are observing in the posted example.

Things get even more complicated if there are more then two classes.

Invalid probability model for large support vector machines using ksvm in R

The origin of the problem is indicated by the following error message:

line search fails

A more specific question, including the original data frame I used, is here: Line search fails in training ksvm prob.model.

probability model in kernlab::ksvm

kernlab currently does not support probability estimation for types other than C-svc, nu-svc and C-bsvc (check the code).

if(type == "probabilities")
{ 
  if(is.null(prob.model(object)[[1]]))
    stop("ksvm object contains no probability model. Make sure you set the paramater prob.model in ksvm during training.")

  if(type(object)=="C-svc"||type(object)=="nu-svc"||type(object)=="C-bsvc")
    {
        [...]
    }
  else
    stop("probability estimates only supported for C-svc, C-bsvc and nu-svc")
}

The problem is native multiclass solutions lack the binary probabilities that go as input into couple. Actually, coding your own solution wouldn't be that hard.

Predict function in R's MLR yielding results inconsistent with predict

The answer lies here:

Why are probabilities and response in ksvm in R not consistent?

In short, ksvm type = "probabilities" gives different results than type = "response".

If I run

 > res2 <- predict(t$learner.model, data.frame(x2,x1,x3), type = "probabilities")
 > res2

then I get the same result as res1 above (type = "response" was default).

Unfortunately it seems that classifying an image based on the probabilities doesn't do as well as using the "response". Maybe that is still the best way to estimate the certainty of a classification?

What about the results for ksvm?

Kernel SVM is by it's very nature not very intepretable. Each kernel uses many predictor variables so its hard to say which predictor variable is important. If you care about predictability, try to use linear regression or other interpretable models.

Why Are Probabilities and Response in Ksvm in R Not Consistent