Undefinedmetricwarning: F-Score Is Ill-Defined and Being Set to 0.0 in Labels with No Predicted Samples

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

As mentioned in the comments, some labels in y_test don't appear in y_pred. Specifically in this case, label '2' is never predicted:

>>> set(y_test) - set(y_pred)
{2}

This means that there is no F-score to calculate for this label, and thus the F-score for this case is considered to be 0.0. Since you requested an average of the score, you must take into account that a score of 0 was included in the calculation, and this is why scikit-learn is showing you that warning.

This brings me to you not seeing the error a second time. As I mentioned, this is a warning, which is treated differently from an error in python. The default behavior in most environments is to show a specific warning only once. This behavior can be changed:

import warnings
warnings.filterwarnings('always')  # "error", "ignore", "always", "default", "module" or "once"

If you set this before importing the other modules, you will see the warning every time you run the code.

There is no way to avoid seeing this warning the first time, aside for setting warnings.filterwarnings('ignore'). What you can do, is decide that you are not interested in the scores of labels that were not predicted, and then explicitly specify the labels you are interested in (which are labels that were predicted at least once):

>>> metrics.f1_score(y_test, y_pred, average='weighted', labels=np.unique(y_pred))
0.91076923076923078

The warning will be gone.

Classification Report - Precision and F-score are ill-defined

This is not an error, just a warning that not all your labels are included in your y_pred, i.e. there are some labels in your y_test that your classifier never predicts.

Here is a simple reproducible example:

from sklearn.metrics import precision_score, f1_score, classification_report

y_true = [0, 1, 2, 0, 1, 2] # 3-class problem
y_pred = [0, 0, 1, 0, 0, 1] # we never predict '2'

precision_score(y_true, y_pred, average='macro') 
[...] UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. 
  'precision', 'predicted', average, warn_for)
0.16666666666666666

precision_score(y_true, y_pred, average='micro') # no warning
0.3333333333333333

precision_score(y_true, y_pred, average=None) 
[...] UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. 
  'precision', 'predicted', average, warn_for)
array([0.5, 0. , 0. ])

Exact same warnings are produced for f1_score (not shown).

Practically this only warns you that in the classification_report, the respective values for labels with no predicted samples (here 2) will be set to 0:

print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

           0       0.50      1.00      0.67         2
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         2

   micro avg       0.33      0.33      0.33         6
   macro avg       0.17      0.33      0.22         6
weighted avg       0.17      0.33      0.22         6

[...] UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. 
  'precision', 'predicted', average, warn_for)

When I was not using np.array in the past it worked just fine

Highly doubtful, since in the example above I have used simple Python lists, and not Numpy arrays...

UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for)

hello i found the solution this problem, you need use:

cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=0)

I'm using knn and this solved the problem

Code:

def knn(self,X_train,X_test,Y_train,Y_test):

   #implementación del algoritmo
   knn = KNeighborsClassifier(n_neighbors=3).fit(X_train,Y_train)
   #10XV
   cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
   puntajes = sum(cross_val_score(knn, X_test, Y_test, 
                                        cv=cv,scoring='f1_weighted'))/10

   print(puntajes)

**Link: **
https://scikit-learn.org/stable/modules/cross_validation.html

How to fix this classification report warning?

You don't want to get rid this warning as it says that your class 2 are not on the predictions as there were no samples in the training set

you got an imbalance classification problem and the class 2 has realy low number of samples, and it was present in the test data only

I suggest you 2 things

StratifiedKFold So when you split for training and test, it consider all classes

Oversampling you might need adjust your data by randomly resample the training dataset to duplicate examples from the minority class

F-score is ill-defined scikit

f1_score with average='macro' will first calculate score for each label individually, and then find their unweighted mean.

So it may happen that if you have multiple classes (labels), then one of them is not present in the predicted data. In that case, you will get the warning for that (absent) label and f1 will be 0 for that. But other labels will still have some non-zero value. So the mean will be non-zero.

For example:

from sklearn.metrics import f1_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 1, 1, 0, 0, 0]
f1_score(y_true, y_pred, average='macro')

/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.

# Output:  0.38888888888888884

In the above example, the predicted data doesnot contain the label 2, and the warning is for that.

Now, about the question as to rely on that or not, please see the related questions:

https://datascience.stackexchange.com/q/15989/41018
https://stats.stackexchange.com/q/156923/133411

Undefinedmetricwarning: F-Score Is Ill-Defined and Being Set to 0.0 in Labels with No Predicted Samples