UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples
As mentioned in the comments, some labels in y_test
don't appear in y_pred
. Specifically in this case, label '2' is never predicted:
>>> set(y_test) - set(y_pred)
{2}
This means that there is no F-score to calculate for this label, and thus the F-score for this case is considered to be 0.0. Since you requested an average of the score, you must take into account that a score of 0 was included in the calculation, and this is why scikit-learn is showing you that warning.
This brings me to you not seeing the error a second time. As I mentioned, this is a warning, which is treated differently from an error in python. The default behavior in most environments is to show a specific warning only once. This behavior can be changed:
import warnings
warnings.filterwarnings('always') # "error", "ignore", "always", "default", "module" or "once"
If you set this before importing the other modules, you will see the warning every time you run the code.
There is no way to avoid seeing this warning the first time, aside for setting warnings.filterwarnings('ignore')
. What you can do, is decide that you are not interested in the scores of labels that were not predicted, and then explicitly specify the labels you are interested in (which are labels that were predicted at least once):
>>> metrics.f1_score(y_test, y_pred, average='weighted', labels=np.unique(y_pred))
0.91076923076923078
The warning will be gone.
Classification Report - Precision and F-score are ill-defined
This is not an error, just a warning that not all your labels are included in your y_pred
, i.e. there are some labels in your y_test
that your classifier never predicts.
Here is a simple reproducible example:
from sklearn.metrics import precision_score, f1_score, classification_report
y_true = [0, 1, 2, 0, 1, 2] # 3-class problem
y_pred = [0, 0, 1, 0, 0, 1] # we never predict '2'
precision_score(y_true, y_pred, average='macro')
[...] UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
0.16666666666666666
precision_score(y_true, y_pred, average='micro') # no warning
0.3333333333333333
precision_score(y_true, y_pred, average=None)
[...] UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
array([0.5, 0. , 0. ])
Exact same warnings are produced for f1_score
(not shown).
Practically this only warns you that in the classification_report
, the respective values for labels with no predicted samples (here 2
) will be set to 0:
print(classification_report(y_true, y_pred))
precision recall f1-score support
0 0.50 1.00 0.67 2
1 0.00 0.00 0.00 2
2 0.00 0.00 0.00 2
micro avg 0.33 0.33 0.33 6
macro avg 0.17 0.33 0.22 6
weighted avg 0.17 0.33 0.22 6
[...] UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
When I was not using np.array in the past it worked just fine
Highly doubtful, since in the example above I have used simple Python lists, and not Numpy arrays...
UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for)
hello i found the solution this problem, you need use:
cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
I'm using knn and this solved the problem
Code:
def knn(self,X_train,X_test,Y_train,Y_test):
#implementación del algoritmo
knn = KNeighborsClassifier(n_neighbors=3).fit(X_train,Y_train)
#10XV
cv = ShuffleSplit(n_splits=10, test_size=0.3, random_state=0)
puntajes = sum(cross_val_score(knn, X_test, Y_test,
cv=cv,scoring='f1_weighted'))/10
print(puntajes)
**Link: **
https://scikit-learn.org/stable/modules/cross_validation.html
How to fix this classification report warning?
You don't want to get rid this warning as it says that your class 2 are not on the predictions as there were no samples in the training set
you got an imbalance classification problem and the class 2 has realy low number of samples, and it was present in the test data only
I suggest you 2 things
StratifiedKFold So when you split for training and test, it consider all classes
Oversampling you might need adjust your data by randomly resample the training dataset to duplicate examples from the minority class
F-score is ill-defined scikit
f1_score
with average='macro'
will first calculate score for each label individually, and then find their unweighted mean.
So it may happen that if you have multiple classes (labels), then one of them is not present in the predicted data. In that case, you will get the warning for that (absent) label and f1 will be 0 for that. But other labels will still have some non-zero value. So the mean will be non-zero.
For example:
from sklearn.metrics import f1_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 1, 1, 0, 0, 0]
f1_score(y_true, y_pred, average='macro')
/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
# Output: 0.38888888888888884
In the above example, the predicted data doesnot contain the label 2, and the warning is for that.
Now, about the question as to rely on that or not, please see the related questions:
- https://datascience.stackexchange.com/q/15989/41018
- https://stats.stackexchange.com/q/156923/133411
Related Topics
String Similarity Metrics in Python
Python JSON.Loads Fails with 'Valueerror: Invalid Control Character At: Line 1 Column 33 (Char 33)'
How to Fix Character Constantly Accelerating in Both Directions After Deceleration Pygame
How to Reduce the Image File Size Using Pil
How to Use a String as a Keyword Argument
Python Command Line Input in a Process
How to Unimport a Python Module Which Is Already Imported
Python Argparse Conditionally Required Arguments
Securely Erasing Password in Memory (Python)
How to Read One Single Line of CSV Data in Python
Python Read File as Stream from Hdfs
Putting a 'Cookie' in a 'Cookiejar'
How to Find the First Key in a Dictionary
How to Use If/Else in a Dictionary Comprehension
Getting List of Pixel Values from Pil