Unbalanced data and weighted cross entropy
Note that weighted_cross_entropy_with_logits
is the weighted variant of sigmoid_cross_entropy_with_logits
. Sigmoid cross entropy is typically used for binary classification. Yes, it can handle multiple labels, but sigmoid cross entropy basically makes a (binary) decision on each of them -- for example, for a face recognition net, those (not mutually exclusive) labels could be "Does the subject wear glasses?", "Is the subject female?", etc.
In binary classification(s), each output channel corresponds to a binary (soft) decision. Therefore, the weighting needs to happen within the computation of the loss. This is what weighted_cross_entropy_with_logits
does, by weighting one term of the cross-entropy over the other.
In mutually exclusive multilabel classification, we use softmax_cross_entropy_with_logits
, which behaves differently: each output channel corresponds to the score of a class candidate. The decision comes after, by comparing the respective outputs of each channel.
Weighting in before the final decision is therefore a simple matter of modifying the scores before comparing them, typically by multiplication with weights. For example, for a ternary classification task,
# your class weights
class_weights = tf.constant([[1.0, 2.0, 3.0]])
# deduce weights for batch samples based on their true label
weights = tf.reduce_sum(class_weights * onehot_labels, axis=1)
# compute your (unweighted) softmax cross entropy loss
unweighted_losses = tf.nn.softmax_cross_entropy_with_logits(onehot_labels, logits)
# apply the weights, relying on broadcasting of the multiplication
weighted_losses = unweighted_losses * weights
# reduce the result to get your final loss
loss = tf.reduce_mean(weighted_losses)
You could also rely on tf.losses.softmax_cross_entropy
to handle the last three steps.
In your case, where you need to tackle data imbalance, the class weights could indeed be inversely proportional to their frequency in your train data. Normalizing them so that they sum up to one or to the number of classes also makes sense.
Note that in the above, we penalized the loss based on the true label of the samples. We could also have penalized the loss based on the estimated labels by simply defining
weights = class_weights
and the rest of the code need not change thanks to broadcasting magic.
In the general case, you would want weights that depend on the kind of error you make. In other words, for each pair of labels X
and Y
, you could choose how to penalize choosing label X
when the true label is Y
. You end up with a whole prior weight matrix, which results in weights
above being a full (num_samples, num_classes)
tensor. This goes a bit beyond what you want, but it might be useful to know nonetheless that only your definition of the weight tensor need to change in the code above.
How to create weighted cross entropy loss?
I suggest in the first instance to resort to using class_weight
from Keras.
class_weight
is a dictionary with {label:weight}
For example, if you have 20 times more examples in label 1 than in label 0, then you can write
# Assign 20 times more weight to label 0
model.fit(..., class_weight = {0:20, 1:0})
In this way you don't need to worry implementing weighted CCE on your own.
Additional note : in your model.compile()
do not forget to use weighted_metrics=['accuracy']
in order to have a relevant reflection of your accuracy.
model.fit(..., class_weight = {0:20, 1:0}, weighted_metrics = ['accuracy'])
Tensorflow: Interpretation of Weight in Weighted Cross Entropy
Actually, it's the other way around. Citing documentation:
The argument
pos_weight
is used as a multiplier for the positive
targets.
So, assuming you have 5
positive examples in your dataset and 7
negative, if you set the pos_weight=2
, then your loss would be as if you had 10
positive examples and 7
negative.
Assume you got all of the positive examples wrong and all negative right. Originally you would have 5
false negatives and 0
false positives. When you increase the pos_weight
, the number of false negatives will artificially increase. Note that the loss value coming from false positives doesn't change.
Related Topics
Cx_Freeze Crashing Python 3.7.0
Get Column Index from Column Name in Python Pandas
How to Let a Raw_Input Repeat Until I Want to Quit
How to Get the Domain Name of My Site Within a Django Template
Finding Occurrences of a Word in a String in Python 3
Adding Headers to Requests Module
How Does Python Find a Module File If the Import Statement Only Contains the Filename
Python: Excluding Modules Pyinstaller
Kivy Not Working (Error: Unable to Find Any Valuable Window Provider.)
What Is the Meaning of "Failed Building Wheel for X" in Pip Install
Remove Non-Ascii Characters from Pandas Column
What Do All the Distributions Available in Scipy.Stats Look Like
JSON.Loads Allows Duplicate Keys in a Dictionary, Overwriting the First Value
Preserving Global State in a Flask Application