Calculate Sklearn.Roc_Auc_Score for Multi-Class

sklearn multiclass roc auc score

roc_auc_score in the multilabel case expects binary label indicators with shape (n_samples, n_classes), it is way to get back to a one-vs-all fashion.

To do that easily, you can use label_binarize (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.label_binarize.html#sklearn.preprocessing.label_binarize).

For your code, it will be:

from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

# You need the labels to binarize
labels = [0, 1, 2, 3]

ytest = [0,1,2,3,2,2,1,0,1]

# Binarize ytest with shape (n_samples, n_classes)
ytest = label_binarize(ytest, classes=labels)

ypreds = [1,2,1,3,2,2,0,1,1]

# Binarize ypreds with shape (n_samples, n_classes)
ypreds = label_binarize(ypreds, classes=labels)


roc_auc_score(ytest, ypreds,average='macro',multi_class='ovo')

Typically, here ypreds and yest become:

ytest
array([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[0, 1, 0, 0]])

ypreds
array([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 0, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0]])

sklearn roc_auc_score with multi_class=="ovr" should have None average available

As you already know, right now sklearn multiclass ROC AUC only handles the macro and weighted averages. But it can be implemented as it can then individually return the scores for each class.

Theoretically speaking, you could implement OVR and calculate per-class roc_auc_score, as:

roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
selected_classifier.fit(train_set_dataframe, train_class == label)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
roc[label] += roc_auc_score(test_class, predictions_proba[:,1])

How to get multi-class roc_auc in cross validate in sklearn?

By default multi_class='raise' so you need explicitly to change this.

From the docs:

multi_class {‘raise’, ‘ovr’, ‘ovo’}, default=’raise’

Multiclass only. Determines the type of configuration to use. The
default value raises an error, so either 'ovr' or 'ovo' must be passed
explicitly.

'ovr':

Computes the AUC of each class against the rest [3] [4]. This treats
the multiclass case in the same way as the multilabel case. Sensitive
to class imbalance even when average == 'macro', because class
imbalance affects the composition of each of the ‘rest’ groupings.

'ovo':

Computes the average AUC of all possible pairwise combinations of
classes [5]. Insensitive to class imbalance when average == 'macro'.


Solution:

Use make_scorer (docs):

from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target

from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(random_state = 0, class_weight="balanced")

from sklearn.metrics import make_scorer
from sklearn.metrics import roc_auc_score

myscore = make_scorer(roc_auc_score, multi_class='ovo',needs_proba=True)

from sklearn.model_selection import cross_validate
cross_validate(clf, X, y, cv=10, scoring = myscore)

LinearSVC and roc_auc_score() for a multi-class problem

There is a specially dedicated class CalibratedClassifierCV for the cases like this:

from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split


# Get the data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Create the model
clf = CalibratedClassifierCV(LinearSVC(max_iter=10000))

# Split the data in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

# Train the model
clf.fit(X_train, y_train)

# Predict the test data
predicted = clf.predict(X_test)
predicted_proba = clf.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, predicted_proba, multi_class='ovr')

As you're choosing between SVC and LinearSVC you may wish to check out this When should one use LinearSVC or SVC?



Related Topics



Leave a reply



Submit