Predict.Svm Does Not Predict New Data

Why can't I predict new data using SVM and KNN?

You have to use a regression model rather than a classification model. For svm based regression use svm.SVR()

import numpy as np
from sklearn import svm

x=np.array([[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11]], dtype=np.float64)
y=np.array([2,3,4,5,6,7,8,9,10,11,12], dtype=np.float64)

clf = svm.SVR(kernel='linear')
clf.fit(x, y)
print(clf.predict([[50]]))
print(clf.score(x, y))

output:

[50.12]
0.9996

How to do the prediction for SVM in R?

You're using the wrong table as your newdata.

You should be using test_val which has gone through the same treatment as train_val. Instead you are training using train_val, but using test as your newdata.

If you make predictions for your test_val table, both the svm and random forest models will work, and will give you 177 predictions.

You will also need to change your submission data.frame to have 177 rows instead of 418.

EDIT
As discussed in the comments (although they've now been removed?), you want to predict for the test data using a model built on the train data.

Try this:

svm.model.linear <- svm(Survived ~ ., data = train, kernel="linear", cost = 2, gamma = 0.1)
svm.prediction.linear <- predict(svm.model.linear, test[,-1])

The predict function works slightly differently for different models in R, which can cause confusion. When you use it with an svm model it is actually calling predict.svm(). This particular function doesn't like that you are passing it newdata with an empty Survived column. If you remove that column by specifying newdata=test[,-1] then the prediction will work as expected.

Error in predict.svm method for regression?

TL/DR: Your test data is too far away from your training data

Take a look at the distribution of your training data compared with your test data.

(M = sapply(dt, mean))
y x w
31.204838 2.550000 5.517325
(S = sapply(dt, sd))
y x w
3.131271 1.436141 0.262107

(100:102 - M)/S
y x w
21.97036 68.55178 368.10419
(c(0,78,1000) - M)/S
y x w
-9.96555 52.53664 3794.18628
(rnorm(3) - M)/S
y x w
-9.118284 -1.747814 -15.895867

Your first data point is 368 standard deviations away from the mean.

Your second data point is 3794 standard deviations away from the mean.

Your third data point is a mere 16 standard deviations away from the mean.

These points are essentially at infinity.

You are discovering that far from the training data, your model is predicting a constant. But if you take data points that are fewer than 3 standard deviations from your training data, you will find that the model is not constant.

SVM Prediction is dropping values

As it is mentioned in the command, you need to get rid of the NA values in your dataset. SVM is handling it for you so that, the pred_SVM output is calculated without the NA values.

To test if there exist NA in your data, just run : sum(is.na(SVMTest))

I am pretty sure that you will see a number greater than zero.

Before starting to build your SVM algorithm, get rid of all NA values by,

dataset <- dataset[complete.cases(dataset), ]

Then after separating your data into Train and Test sets you can run ,

SVM_swim <- svm(.....,data = SVMTrain, kernel='linear')


Related Topics



Leave a reply



Submit