How Does Keras Calculate the Accuracy

How is the keras accuracy showed in progress bar calculated? From which inputs is it calculated? How to replicate it?

I think it has to do with the usage of Dropout. Dropout is only enabled during training, but not during evaluation or prediction. Hence the discrepancy of the accuracies during training and evaluation/prediction.

Moreover, the training accuracy that is displayed in the bar shows the averaged accuracy over the training epoch, averaged over the batch accuracies calculated after each batch. Keep in mind that the model parameters are tuned after each batch, such that the accuracy shown in the bar at the end does not exactly match the accuracy of a valication after the epoch is finished (because the training accuracy is calculated with different model parameters per batch, and the validation accuracy is calculated with the same parameters for all batches).

This is your example, with more data (therefore more than one epoch), and without dropout:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import callbacks

np.random.seed(1) # fix random seed for reproducibility 
# Generate dummy data
x_train = np.random.random((200, 20))
y_train = np.random.randint(2, size=(200, 1))

model = Sequential()
model.add(Dense(64, input_dim=20, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

class MyEval(callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        my_accuracy_1 = self.model.evaluate(x_train, y_train, verbose=0)[1]
        y_pred = self.model.predict(x_train)
        my_accuracy_2 = np.mean(np.equal(y_train, np.round(y_pred)))
        print("my accuracy 1 after epoch {}: {}".format(epoch + 1,my_accuracy_1))
        print("my accuracy 2 after epoch {}: {}".format(epoch + 1,my_accuracy_2))

my_eval = MyEval()

model.fit(x_train, y_train,
          epochs=5,
          batch_size=13,
          callbacks=[my_eval],
          shuffle=False)

The output reads:

Train on 200 samples
Epoch 1/5
my accuracy 1 after epoch 1: 0.5450000166893005
my accuracy 2 after epoch 1: 0.545
200/200 [==============================] - 0s 2ms/sample - loss: 0.6978 - accuracy: 0.5350
Epoch 2/5
my accuracy 1 after epoch 2: 0.5600000023841858
my accuracy 2 after epoch 2: 0.56
200/200 [==============================] - 0s 383us/sample - loss: 0.6892 - accuracy: 0.5550
Epoch 3/5
my accuracy 1 after epoch 3: 0.5799999833106995
my accuracy 2 after epoch 3: 0.58
200/200 [==============================] - 0s 496us/sample - loss: 0.6844 - accuracy: 0.5800
Epoch 4/5
my accuracy 1 after epoch 4: 0.6000000238418579
my accuracy 2 after epoch 4: 0.6
200/200 [==============================] - 0s 364us/sample - loss: 0.6801 - accuracy: 0.6150
Epoch 5/5
my accuracy 1 after epoch 5: 0.6050000190734863
my accuracy 2 after epoch 5: 0.605
200/200 [==============================] - 0s 393us/sample - loss: 0.6756 - accuracy: 0.6200

The validation accuracy after the epoch pretty much resembles the averaged training accuracy at the end of the epoch now.

How does Keras calculate the accuracy?

You are a little confused here; you speak about accuracy, while showing the formula for the loss.

The equation you show is indeed the cross-entropy loss formula for binary classification (or simply logistic loss).

y[i] are the labels, which are indeed either 0 or 1.

p[i] are the predictions, usually interpreted as probabilities, which are real numbers in [0,1] (without any rounding).

Now for each i, only one term in the sum will survive - the first term vanishes when y[i] = 0, and similarly the second term vanishes when y[i] = 1.

Let's see some examples:

Suppose that y[0] = 1, while we have predicted p[0] = 0.99 (i.e. a rather good prediction). The second term of the sum vanishes (since 1 - y[0] = 0), while the first one becomes log(0.99) = -0.01; so, the contribution of this sample prediction (i=0) to the overall loss is 0.01 (due to the - sign in front of the sum).

Suppose now that the true label of the next sample is again 1, i.e. y[1] = 1, but here we have made a rather poor prediction of p[1] = 0.1; again, the second term vanishes, and the contribution of this prediction to the overall loss is now -log(0.1) = 2.3, which is indeed greater than our first, good prediction, as we should expect intuitively.

As a final example, let's suppose that y[2] = 0, and we have made a perfectly good prediction here of p[2] = 0; hence, the first term vanishes, and the second term becomes

(1 - y[2]) * log(1 - p[2]) = 1 * log(1) = log(1) = 0

i.e. we have no loss contributed, again as we intuitively expected, since we have made a perfectly good prediction here for i=2.

The logistic loss formula simply computes all these errors of the individual predictions, sums them, and divides by their number n.

Nevertheless, this is the loss (i.e. scores[0] in your snippet), and not the accuracy.

Loss and accuracy are different things; roughly speaking, the accuracy is what we are actually interested in from a business perspective, while the loss is the objective function that the learning algorithms (optimizers) are trying to minimize from a mathematical perspective. Even more roughly speaking, you can think of the loss as the "translation" of the business objective (accuracy) to the mathematical domain, a translation which is necessary in classification problems (in regression ones, usually the loss and the business objective are the same, or at least can be the same in principle, e.g. the RMSE)...

Will Keras automatically round our predictions to 0 or 1?

Actually yes: to compute the accuracy, we implicitly set a threshold in the predicted probabilities (usually 0.5 in binary classification, but this may differ in the case of highly imbalanced data); so, in model.evaluate, Keras actually converts our predictions to 1 if p[i] > 0.5 and to 0 otherwise. Then, the accuracy is computed by simply counting the cases where y_true==y_pred (correct predictions) and dividing by the total number of samples, to give a number in [0,1].

So, to summarize:

There is no rounding for the computation of loss
There is an implicit thresholding operation for the computation of accuracy

How does Keras compute validation accuracy and training accuracy for multi-class classification problems?

You can find the metrics file and their implementation in the Keras github repo. In this case following metric applies:

def categorical_accuracy(y_true, y_pred):
    return K.cast(K.equal(K.argmax(y_true, axis=-1),
                          K.argmax(y_pred, axis=-1)),
                          K.floatx())

This calculates the accuracy of a single (y_true, y_pred) pair by checking if the predicted class is the same as the true class. It does this so comparing the index of the highest scoring class in y_pred vector and the index of the actual class in the y_true vector. It returns 0 or 1.

It uses this function to calculate the overall accuracy of the data set, by using the conventional accuracy formula, which is defined as

(amount of correct guesses)/(total amount of guesses)

How does keras define accuracy and loss?

Have a look at metrics.py, there you can find definition of all available metrics including different types of accuracy. Accuracy is not printed unless you add it to the list of desired metrics when you compile your model.

Regularizers are by definition added to the loss. For example, see add_loss method of the Layerclass.

Update

The type of accuracy is determined based on the objective function, see training.py. The default choice is categorical_accuracy. Other types like binary_accuracy and sparse_categorical_accuracy are selected when the objective function is either binary or sparse.

How does Tensorflow calculate the accuracy of model?

The functions used to calculate the accuracy can be found here. There are different definitions depending on your problem, such as binary_accuracy or categorical_accuracy. The proper one is chosen automatically, based on the output shape and your loss (see the handle_metrics function here). Based on those:

1.

It depends on your model. In your first example it will use

def binary_accuracy(y_true, y_pred):
    '''Calculates the mean accuracy rate across all predictions for binary
    classification problems.
    '''
    return K.mean(K.equal(y_true, K.round(y_pred)))

As you can see it simply rounds the models predictions. In your second example it will use

def sparse_categorical_accuracy(y_true, y_pred):
    '''Same as categorical_accuracy, but useful when the predictions are for
    sparse targets.
    '''
    return K.mean(K.equal(K.max(y_true, axis=-1),
                          K.cast(K.argmax(y_pred, axis=-1), K.floatx())))

Here no rounding occurs, but it checks weather the class with the highest prediction is the same as the class with the true label.

2.

Again binary_accuracy will be used. However the predictions will come from a sigmoid activation.

3.

The sigmoid activation will change your outputs. It will ensure that the predictions are between 0 and 1. The accuracy changes because of that, e.g. 0 becomes 0.5 and is therefore rounded to 1. It will also effect training. It is common to use a sigmoid activation with crossentropy as it expects a probability.

difference in the calculation of accuracy between keras and scikit-learn

There may be two types of correct answers in the case of multi-label classification.

If all of the sub-labels are correct of a prediction. Example: in the demo dataset y_true, there are 5 outputs. In y_pred, 3 of them are fully correct.
In this case, the accuracy should be 60%.
If we also consider the sub-labels of multi-label classification, then the accuracy gets changed. Example: the demo dataset y_true contains a total of 15 predictions. y_pred correctly predicts 10 of them. In this case, the accuracy should be 66.7%.

SkLearn handles multi-label classification as stated in point 1. Whereas,
the Keras accuracy metric follows the method stated in point 2. A code example is given below.

Code:

import tensorflow as tf
from sklearn.metrics import accuracy_score
import numpy as np

# A demo dataset 
y_true = np.array([[0, 1, 0], [1, 0, 0], [1, 1, 1], [0, 0, 0], [1, 0, 1]])
y_pred = np.array([[1, 0, 0], [1, 0, 0], [0, 0, 0], [0, 0, 0], [1, 0, 1]])

kacc = tf.keras.metrics.Accuracy()
_ = kacc.update_state(y_true, y_pred)
print(f'Keras Accuracy acc: {kacc.result().numpy()*100:.3}')

kbacc = tf.keras.metrics.BinaryAccuracy()
_ = kbacc.update_state(y_true, y_pred)
print(f'Keras BinaryAccuracy acc: {kbacc.result().numpy()*100:.3}')

print(f'SkLearn acc: {accuracy_score(y_true, y_pred)*100:.3}')

Outputs:

Keras Accuracy acc: 66.7
Keras BinaryAccuracy acc: 66.7
SkLearn acc: 60.0

Therefore, you have to choose any of the options. If you choose to go with method 1, then you have to implement an accuracy metric manually. However, multi-label training is generally done using sigmoid with binary_crossentropy loss. The binary_crossentropy minimizes the loss based on method 2. Therefore, you should follow it also.

Keras accuracy of only most certain predictions

I managed to solve my problem using this code:

import tensorflow as tf

def topacc(y_true, y_pred):
  k = tf.cast(len(y_true) // 10, 'int64')
  y_true, y_pred = tf.transpose(y_true), tf.transpose(y_pred)
  return tf.keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=k)

This works as a full keras metric