Preprocessing in Scikit Learn - Single Sample - Depreciation Warning

Preprocessing in scikit learn - single sample - Depreciation warning

Just listen to what the warning is telling you:

Reshape your data either X.reshape(-1, 1) if your data has a single feature/column
and X.reshape(1, -1) if it contains a single sample.

For your example type(if you have more than one feature/column):

temp = temp.reshape(1,-1) 

For one feature/column:

temp = temp.reshape(-1,1)

warning message in scikit-learn

The input to clf.predict should be a 2D array. Thus, instead of writing

print(clf.predict([0,1]))

you need to write

print(clf.predict([[0,1]]))

Accuracy of preprocessing single sample

You should use StandardScaler which is a wrapper over the scale function as described here. This wrapper stores the mean and standard deviation learned from the training data and then uses this information to scale the other data.

Example usage:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

trainData = scaler.fit_transform(trainData)
# I have used reshape because of single sample. In other cases, its not needed
log = scaler.transform(np.reshape(log, (1,-1)))

fit_transform() is just a shortcut for first calling fit() and then transform().

fit() method does not return anything. It just analyses the data to learn the mean and standard_deviation. transform() will use the learnt mean and std to scale the data and returns the new data.

You should only call fit() or fit_transform() on the training data,never on anything else. For transforming the test or new data, always use transform().

Sklearn train model with single sample raises a DeprecationWarning

If you read the error message you can see that passing single dimensional arrays will soon not be supported. Instead you have to ensure that your single sample looks like a list of samples, in which there is just one. When dealing with NumPy arrays (which is recommended), you can use reshape(-1, 1) however as you're using lists then the following will do:

clf = clf.fit([[130, 1]], [0])

Getting a weird error that says 'Reshape your data either using array.reshape(-1, 1)'

Any sklearn.Transformer expects a [sample size, n_features] sized array. So there's two scenarios you will have to reshape your data,

  • If you only have a single sample, you need to reshape it to [1, n_features] sized array
  • If you have only a single feature, you need to reshape it to [sample size, 1] sized array

So you need to do what suits the problem. You are passing a 1D vector.

[1. 1. 1. ... 8. 1. 1.]

If this is a single sample, reshape it to (1, -1) sized array and you will be fine. But with that said you might want to think about the following.

  • If this is a single sample, there's no point in fitting a model with a single sample. You won't get any benefit.
  • If this is a set of samples with a single feature, I don't really see a benefit in doing K-means on such a dataset.

Getting deprecation warning in Sklearn over 1d array, despite not having a 1D array

The error is coming from the predict method. Numpy will interpret [1,1] as a 1d array. So this should avoid the warning:

clf.predict(np.array([[1,1]]))

Notice that:

In [14]: p1 = np.array([1,1])

In [15]: p1.shape
Out[15]: (2,)

In [16]: p2 = np.array([[1,1]])

In [17]: p2.shape
Out[17]: (1, 2)

Also, note that you can't use an array of shape (2,1)

In [21]: p3 = np.array([[1],[1]])

In [22]: p3.shape
Out[22]: (2, 1)

In [23]: clf.predict(p3)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-e4070c037d78> in <module>()
----> 1 clf.predict(p3)

/home/juan/anaconda3/lib/python3.5/site-packages/sklearn/svm/base.py in predict(self, X)
566 Class labels for samples in X.
567 """
--> 568 y = super(BaseSVC, self).predict(X)
569 return self.classes_.take(np.asarray(y, dtype=np.intp))
570

/home/juan/anaconda3/lib/python3.5/site-packages/sklearn/svm/base.py in predict(self, X)
303 y_pred : array, shape (n_samples,)
304 """
--> 305 X = self._validate_for_predict(X)
306 predict = self._sparse_predict if self._sparse else self._dense_predict
307 return predict(X)

/home/juan/anaconda3/lib/python3.5/site-packages/sklearn/svm/base.py in _validate_for_predict(self, X)
472 raise ValueError("X.shape[1] = %d should be equal to %d, "
473 "the number of features at training time" %
--> 474 (n_features, self.shape_fit_[1]))
475 return X
476

ValueError: X.shape[1] = 1 should be equal to 2, the number of features at training time

Scikit-learn tutorial gives me a depreciation error, how to update?

Try the following:

print ("A 12-inch pizza should cost: $%.2f" % model.predict(np.array([12]).reshape(1, -1)[0]))

I used reshape(1,-1) for passing 2d array to predict function.



Related Topics



Leave a reply



Submit