How to Get Precision, Recall and F-Measure from Confusion Matrix in Python

How to get precision, recall and f-measure from confusion matrix in Python

Let's consider the case of MNIST data classification (10 classes), where for a test set of 10,000 samples we get the following confusion matrix cm (Numpy array):

array([[ 963,    0,    0,    1,    0,    2,   11,    1,    2,    0],
       [   0, 1119,    3,    2,    1,    0,    4,    1,    4,    1],
       [  12,    3,  972,    9,    6,    0,    6,    9,   13,    2],
       [   0,    0,    8,  975,    0,    2,    2,   10,   10,    3],
       [   0,    2,    3,    0,  953,    0,   11,    2,    3,    8],
       [   8,    1,    0,   21,    2,  818,   17,    2,   15,    8],
       [   9,    3,    1,    1,    4,    2,  938,    0,    0,    0],
       [   2,    7,   19,    2,    2,    0,    0,  975,    2,   19],
       [   8,    5,    4,    8,    6,    4,   14,   11,  906,    8],
       [  11,    7,    1,   12,   16,    1,    1,    6,    5,  949]])

In order to get the precision & recall (per class), we need to compute the TP, FP, and FN per class. We don't need TN, but we will compute it, too, as it will help us for our sanity check.

The True Positives are simply the diagonal elements:

# numpy should have already been imported as np
TP = np.diag(cm)
TP
# array([ 963, 1119,  972,  975,  953,  818,  938,  975,  906,  949])

The False Positives are the sum of the respective column, minus the diagonal element (i.e. the TP element):

FP = np.sum(cm, axis=0) - TP
FP
# array([50, 28, 39, 56, 37, 11, 66, 42, 54, 49])

Similarly, the False Negatives are the sum of the respective row, minus the diagonal (i.e. TP) element:

FN = np.sum(cm, axis=1) - TP
FN
# array([17, 16, 60, 35, 29, 74, 20, 53, 68, 60])

Now, the True Negatives are a little trickier; let's first think what exactly a True Negative means, with respect to, say class 0: it means all the samples that have been correctly identified as not being 0. So, essentially what we should do is remove the corresponding row & column from the confusion matrix, and then sum up all the remaining elements:

num_classes = 10
TN = []
for i in range(num_classes):
    temp = np.delete(cm, i, 0)    # delete ith row
    temp = np.delete(temp, i, 1)  # delete ith column
    TN.append(sum(sum(temp)))
TN
# [8970, 8837, 8929, 8934, 8981, 9097, 8976, 8930, 8972, 8942]

Let's make a sanity check: for each class, the sum of TP, FP, FN, and TN must be equal to the size of our test set (here 10,000): let's confirm that this is indeed the case:

l = 10000
for i in range(num_classes):
    print(TP[i] + FP[i] + FN[i] + TN[i] == l)

The result is

True
True
True
True
True
True
True
True
True
True

Having calculated these quantities, it is now straightforward to get the precision & recall per class:

precision = TP/(TP+FP)
recall = TP/(TP+FN)

which for this example are

precision
# array([ 0.95064166,  0.97558849,  0.96142433,  0.9456838 ,  0.96262626,
#         0.986731  ,  0.93426295,  0.95870206,  0.94375   ,  0.9509018])

recall
# array([ 0.98265306,  0.98590308,  0.94186047,  0.96534653,  0.97046843,
#         0.91704036,  0.97912317,  0.94844358,  0.9301848 ,  0.94053518])

Similarly we can compute related quantities, like specificity (recall that sensitivity is the same thing with recall):

specificity = TN/(TN+FP)

Results for our example:

specificity
# array([0.99445676, 0.99684151, 0.9956512 , 0.99377086, 0.99589709,
#        0.99879227, 0.99270073, 0.99531877, 0.99401728, 0.99455011])

You should now be able to compute these quantities virtually for any size of your confusion matrix.

Need help finding the precision and recall for a confusion matrix

As already recommended by someone else, you can directly calulate over y_actual and y_predictedusingsklearn.metricswithprecision_scoreandrecall_score`to calculate what you need. Read more here for precision and recall scores.

But, IIUC, you are looking to do the same, directly, with a confusion matrix. Here is how you calculate precision and recall using the confusion matrix directly.

First I'll demonstrate by using a dummy example, showing results from SKLEARN API and then calculating them directly.

NOTE: There are 2 types of precision and recall that are generally calculated -
Micro precision: All TP across all classes summed and divided by the TP+FP
Macro precision: Calculate TP/TP+FP for each class separately, and then take an average (ignorning nans)
You can find more details on types of precision (and recall) here.

I show both the methods for your understanding below -

import numpy as np
from sklearn.metrics import confusion_matrix, precision_score, recall_score

####################################################
#####Using SKLEARN API on TRUE & PRED Labels########
####################################################

y_true = [0, 1, 2, 2, 1, 1]
y_pred = [0, 2, 2, 2, 1, 2]
confusion_matrix(y_true, y_pred)

precision_micro = precision_score(y_true, y_pred, average="micro")
precision_macro = precision_score(y_true, y_pred, average="macro")
recall_micro = recall_score(y_true, y_pred, average='micro')
recall_macro = recall_score(y_true, y_pred, average="macro")

print("Sklearn API")
print("precision_micro:", precision_micro)
print("precision_macro:", precision_macro)
print("recall_micro:", recall_micro)
print("recall_macro:", recall_macro)

####################################################
####Calculating directly from confusion matrix######
####################################################

cf = confusion_matrix(y_true, y_pred)
TP = cf.diagonal()

precision_micro = TP.sum()/cf.sum()
recall_micro = TP.sum()/cf.sum()

#NOTE: The sum of row-wise sums of a matrix = sum of column-wise sums of a matrix = sum of all elements of a matrix
#Therefore, the micro-precision and micro-recall is mathematically the same for a multi-class problem.

precision_macro = np.nanmean(TP/cf.sum(0))
recall_macro = np.nanmean(TP/cf.sum(1))

print("")
print("Calculated:")
print("precision_micro:", precision_micro)
print("precision_macro:", precision_macro)
print("recall_micro:", recall_micro)
print("recall_macro:", recall_macro)

Sklearn API
precision_micro: 0.6666666666666666
precision_macro: 0.8333333333333334
recall_micro: 0.6666666666666666
recall_macro: 0.7777777777777777

Calculated:
precision_micro: 0.6666666666666666
precision_macro: 0.8333333333333334
recall_micro: 0.6666666666666666
recall_macro: 0.7777777777777777

Now that I have proven that the definitions behind the APIs work as described, let's calculate precision and recall for your case.

cf = [[748,   0,   4,   5,   1,  16,   9,   4,   8,   0],
      [  0, 869,   6,   5,   2,   2,   2,   5,  12,   3],
      [  6,  19, 642,  33,  13,   7,  16,  15,  31,   6],
      [  5,   3,  30, 679,   2,  44,   1,  12,  23,  12],
      [  4,   7,   9,   2, 704,   5,  10,   8,   7,  43],
      [  5,   6,  10,  39,  11, 566,  18,   4,  33,  10],
      [  6,   5,  17,   2,   5,  12, 737,   2,   9,   3],
      [  5,   7,   8,  18,  14,   2,   0, 752,   5,  42],
      [  7,  15,  34,  28,  12,  29,   6,   4, 600,  18],
      [  4,   6,   6,  16,  21,   4,   0,  50,   8, 680]]

cf = np.array(cf)
TP = cf.diagonal()

precision_micro = TP.sum()/cf.sum()
recall_micro = TP.sum()/cf.sum()

precision_macro = np.nanmean(TP/cf.sum(0))
recall_macro = np.nanmean(TP/cf.sum(1))

print("Calculated:")
print("precision_micro:", precision_micro)
print("precision_macro:", precision_macro)
print("recall_micro:", recall_micro)
print("recall_macro:", recall_macro)

Calculated:
precision_micro: 0.872125
precision_macro: 0.8702549015235986
recall_micro: 0.872125
recall_macro: 0.8696681555022805

How to generate Precision, Recall and F-score in Named Entity Recognition using Spacy v3? Seeking ents_p, ents_r, ents_f for a small custom NER model

I will give a brief example :

import spacy
from spacy.scorer import Scorer
from spacy.tokens import Doc
from spacy.training.example import Example

examples = [
    ('Who is Talha Tayyab?',
     {(7, 19, 'PERSON')}),
    ('I like London and Berlin.',
     {(7, 13, 'LOC'), (18, 24, 'LOC')}),
     ('Agra is famous for Tajmahal, The CEO of Facebook will visit India shortly to meet Murari Mahaseth and to visit Tajmahal.',
     {(0, 4, 'LOC'), (40, 48, 'ORG'), (60, 65, 'GPE'), (82, 97, 'PERSON'), (111, 119, 'GPE')})
]

def my_evaluate(ner_model, examples):
    scorer = Scorer()
    example = []
    for input_, annotations in examples:
        pred = ner_model(input_)
        print(pred,annotations)
        temp = Example.from_dict(pred, dict.fromkeys(annotations))
        example.append(temp)
    scores = scorer.score(example)
    return scores

ner_model = spacy.load('en_core_web_sm') # for spaCy's pretrained use 'en_core_web_sm'
results = my_evaluate(ner_model, examples)
print(results)

As, I said this is just an example you can make changes according to your needs.

calculate precision and recall in a confusion matrix

first, your matrix is arranged upside down.
You want to arrange your labels so that true positives are set on the diagonal [(0,0),(1,1),(2,2)] this is the arrangement that you're going to find with confusion matrices generated from sklearn and other packages.

Once we have things sorted in the right direction, we can take a page from this answer and say that:

True Positives are on the diagonal position
False positives are column-wise sums. Without the diagonal
False negatives are row-wise sums. Without the diagonal.

\ Then we take some formulas from sklearn docs for precision and recall.
And put it all into code:

import numpy as np
cm = np.array([[2,1,0], [3,4,5], [6,7,8]])
true_pos = np.diag(cm)
false_pos = np.sum(cm, axis=0) - true_pos
false_neg = np.sum(cm, axis=1) - true_pos

precision = np.sum(true_pos / (true_pos + false_pos))
recall = np.sum(true_pos / (true_pos + false_neg))

Since we remove the true positives to define false_positives/negatives only to add them back... we can simplify further by skipping a couple of steps:

 true_pos = np.diag(cm) 
 precision = np.sum(true_pos / np.sum(cm, axis=0))
 recall = np.sum(true_pos / np.sum(cm, axis=1))

Easy way to extract common measures such as accuracy, precision, recall from 3x3 confusion matrix with numpy or pansas?

For what it's worth, code like

accuracy_0 = cm[0][0]/cm[3][3]
accuracy_1 = cm[1][1]/cm[3][3]
accuracy_2 = cm[2][2]/cm[3][3]
accuracy = (accuracy_0, accuracy_1, accuracy_2)

can be replaced with the more concise

accuracy = cm.diagonal()[:-1]/cm[-1,-1]

so you might rewrite your code as

cm = confusion_matrix.to_numpy()
diag = cm.diagonal()[:-1]

accuracy  = diag / cm[ -1, -1]
precision = diag / cm[  3,:-1]
recall    = diag / cm[:-1,  3]
f_score   = 2 * precision * recall / (precision + recall)

out = pd.DataFrame({'Accuracy': accuracy, 
                    'Precision': precision, 
                    'Recall': recall,
                    'F-score': f_score}).round(2)

How to Get Precision, Recall and F-Measure from Confusion Matrix in Python