sklearn multiclass roc auc score
roc_auc_score in the multilabel case expects binary label indicators with shape (n_samples, n_classes), it is way to get back to a one-vs-all fashion.
To do that easily, you can use label_binarize (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.label_binarize.html#sklearn.preprocessing.label_binarize).
For your code, it will be:
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize
# You need the labels to binarize
labels = [0, 1, 2, 3]
ytest = [0,1,2,3,2,2,1,0,1]
# Binarize ytest with shape (n_samples, n_classes)
ytest = label_binarize(ytest, classes=labels)
ypreds = [1,2,1,3,2,2,0,1,1]
# Binarize ypreds with shape (n_samples, n_classes)
ypreds = label_binarize(ypreds, classes=labels)
roc_auc_score(ytest, ypreds,average='macro',multi_class='ovo')
Typically, here ypreds and yest become:
ytest
array([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0]])
ypreds
array([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]])
sklearn roc_auc_score with multi_class=="ovr" should have None average available
As you already know, right now sklearn
multiclass ROC AUC only handles the macro
and weighted
averages. But it can be implemented as it can then individually return the scores for each class.
Theoretically speaking, you could implement OVR
and calculate per-class roc_auc_score
, as:
roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
selected_classifier.fit(train_set_dataframe, train_class == label)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
roc[label] += roc_auc_score(test_class, predictions_proba[:,1])
How to get multi-class roc_auc in cross validate in sklearn?
By default multi_class='raise'
so you need explicitly to change this.
From the docs:
multi_class {‘raise’, ‘ovr’, ‘ovo’}, default=’raise’
Multiclass only. Determines the type of configuration to use. The
default value raises an error, so either 'ovr' or 'ovo' must be passed
explicitly.
'ovr'
:Computes the AUC of each class against the rest [3] [4]. This treats
the multiclass case in the same way as the multilabel case. Sensitive
to class imbalance even whenaverage == 'macro'
, because class
imbalance affects the composition of each of the ‘rest’ groupings.
'ovo'
:Computes the average AUC of all possible pairwise combinations of
classes [5]. Insensitive to class imbalance whenaverage == 'macro'
.
Solution:
Use make_scorer
(docs):
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(random_state = 0, class_weight="balanced")
from sklearn.metrics import make_scorer
from sklearn.metrics import roc_auc_score
myscore = make_scorer(roc_auc_score, multi_class='ovo',needs_proba=True)
from sklearn.model_selection import cross_validate
cross_validate(clf, X, y, cv=10, scoring = myscore)
LinearSVC and roc_auc_score() for a multi-class problem
There is a specially dedicated class CalibratedClassifierCV
for the cases like this:
from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
# Get the data
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Create the model
clf = CalibratedClassifierCV(LinearSVC(max_iter=10000))
# Split the data in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
# Train the model
clf.fit(X_train, y_train)
# Predict the test data
predicted = clf.predict(X_test)
predicted_proba = clf.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, predicted_proba, multi_class='ovr')
As you're choosing between SVC and LinearSVC you may wish to check out this When should one use LinearSVC or SVC?
Related Topics
Iterate Over Worksheets, Rows, Columns
How to Change a Two Dimensional Array to One Dimensional
Splitting Strings into Numbers (Python)
Python: Using Doctests for Classes
Background Color for Tk in Python
Reading an Excel Named Range into a Pandas Dataframe
Set Working Directory in Python/Spyder So That It's Reproducible
Index 0 Is Out of Bounds for Axis 0 With Size 0
Convert Pandas Dataframe to Numpy Array
How to Get a Fields Particular Value of Json in Python
How to Divide a Given Time Interval into Equal Intervals
Google Chrome Closes Immediately After Being Launched With Selenium
How to Convert a List of Dictionaries to Json in Python/Django
Django Rest Framework Csrf Failed: Csrf Cookie Not Set
How to Retrieve Data from Dynamic Table - Selenium Python
How to Assign Class Instance to a Variable and Use That in Other Class