Tackling Class Imbalance: Scaling Contribution to Loss and Sgd

Tackling Class Imbalance: scaling contribution to loss and sgd

Why don't you use the InfogainLoss layer to compensate for the imbalance in your training set?

The Infogain loss is defined using a weight matrix H (in your case 2-by-2) The meaning of its entries are

[cost of predicting 1 when gt is 0,    cost of predicting 0 when gt is 0
 cost of predicting 1 when gt is 1,    cost of predicting 0 when gt is 1]

So, you can set the entries of H to reflect the difference between errors in predicting 0 or 1.

You can find how to define matrix H for caffe in this thread.

Regarding sample weights, you may find this post interesting: it shows how to modify the SoftmaxWithLoss layer to take into account sample weights.

Recently, a modification to cross-entropy loss was proposed by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár Focal Loss for Dense Object Detection, (ICCV 2017).

The idea behind focal-loss is to assign different weight for each example based on the relative difficulty of predicting this example (rather based on class size etc.). From the brief time I got to experiment with this loss, it feels superior to "InfogainLoss" with class-size weights.

Tackling imbalanced class members in Caffe: weight contribution of each instance to loss value

1.If you copy from current infogain_loss_layer.cpp you can easily adapt. For forward pass change line 59-66 like:

// assuming num = batch size, dim = label size, image_dim = image height * width
Dtype loss = 0;
for (int i = 0; i < num; ++i) {
  for(int k = 0; k < image_dim; k++) {
    int label = static_cast<int>(bottom_label[i*image_dim+k]);
    for (int j = 0; j < dim; ++j) {
      Dtype prob = std::max(bottom_data[i *image_dim *dim+ k * dim + j], Dtype(kLOG_THRESHOLD));
      loss -= infogain_mat[label * dim + j] * log(prob);
    }
  }
}

Similarly for backward pass you could change line 95-101 like:

for (int i = 0; i < num; ++i) {
  for(int k = 0; k < image_dim; k++) {
    const int label = static_cast<int>(bottom_label[i*image_dim+k]);
    for (int j = 0; j < dim; ++j) {
      Dtype prob = std::max(bottom_data[i *image_dim *dim+ k * dim + j], Dtype(kLOG_THRESHOLD));
      bottom_diff[i *image_dim *dim+ k * dim + j] = scale * infogain_mat[label * dim + j] / prob;
    }
  }
}

This is kind of naive. I don't seem to find any option for optimization. You will also need to change some setup code in reshape.

2.In this PR suggestion is that for diagonal entries in H put min_count/|i| where |i| is the number of samples has label i. Everything else as 0. Also see this . As for loading the weight matrix H is fixed for all input. You can load it as lmdb file or in other ways.

3.Yes you will need to rebuild.

Update:
As Shai pointed out the infogain pull for this has already been approved this week. So current version of caffe supports pixelwise infogain loss.

Weighing loss of one class higher than others in deep learning

You can assign weights for each class manually. For example:

class_weight = {0: 0.2, 1: 0.3, 2: 0.25, 3: 0.25}
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=[mean_iou], class_weight=class_weight)

or you can use this scikit library function
There are also many examples in the web, didn't any of them work for you?

Is there any example of using weighted loss for pixel-wise segmentation/classification tasks?

You can tackle class imbalance using "InfogainLoss". This loss can be viewed as an extension to "SoftmaxWithLoss" that enables you to "pay" different loss value per label.

If you want to use "InfogainLoss" for pixel-wise predictions, you might need to use BVLC/caffe PR#3855.

Tackling Class Imbalance: Scaling Contribution to Loss and Sgd