Saptarshi Sinha, Hiroki Ohashi, Katsuyuki Nakamura
Class-imbalance is one of the major challenges in real world datasets, where a few classes (called majority classes) constitute much more data samples than the rest (called minority classes). Learning deep neural networks using such datasets leads to performances that are typically biased towards the majority classes. Most of the prior works try to solve class-imbalance by assigning more weights to the minority classes in various manners (e.g., data re-sampling, cost-sensitive learning). However, we argue that the number of available training data may not be always a good clue to determine the weighting strategy because some of the minority classes might be sufficiently represented even by a small number of training data. Overweighting samples of such classes can lead to drop in the model's overall performance. We claim that the 'difficulty' of a class as perceived by the model is more important to determine the weighting. In this light, we propose a novel loss function named Class-wise Difficulty-Balanced loss, or CDB loss, which dynamically distributes weights to each sample according to the difficulty of the class that the sample belongs to. Note that the assigned weights dynamically change as the 'difficulty' for the model may change with the learning progress. Extensive experiments are conducted on both image (artificially induced class-imbalanced MNIST, long-tailed CIFAR and ImageNet-LT) and video (EGTEA) datasets. The results show that CDB loss consistently outperforms the recently proposed loss functions on class-imbalanced datasets irrespective of the data type (i.e., video or image).
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | CIFAR-100-LT (ρ=10) | Error Rate | 41.26 | CDB-loss |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 38.5 | CDB-loss (ResNet 10) |
| Image Classification | CIFAR-100-LT (ρ=100) | Error Rate | 57.43 | CDB-loss |
| Image Classification | EGTEA | Average Precision | 63.86 | CDB-loss (3D- ResNeXt101) |
| Image Classification | EGTEA | Average Recall | 66.24 | CDB-loss (3D- ResNeXt101) |
| Few-Shot Image Classification | CIFAR-100-LT (ρ=10) | Error Rate | 41.26 | CDB-loss |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 38.5 | CDB-loss (ResNet 10) |
| Few-Shot Image Classification | CIFAR-100-LT (ρ=100) | Error Rate | 57.43 | CDB-loss |
| Few-Shot Image Classification | EGTEA | Average Precision | 63.86 | CDB-loss (3D- ResNeXt101) |
| Few-Shot Image Classification | EGTEA | Average Recall | 66.24 | CDB-loss (3D- ResNeXt101) |
| Generalized Few-Shot Classification | CIFAR-100-LT (ρ=10) | Error Rate | 41.26 | CDB-loss |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 38.5 | CDB-loss (ResNet 10) |
| Generalized Few-Shot Classification | CIFAR-100-LT (ρ=100) | Error Rate | 57.43 | CDB-loss |
| Generalized Few-Shot Classification | EGTEA | Average Precision | 63.86 | CDB-loss (3D- ResNeXt101) |
| Generalized Few-Shot Classification | EGTEA | Average Recall | 66.24 | CDB-loss (3D- ResNeXt101) |
| Long-tail Learning | CIFAR-100-LT (ρ=10) | Error Rate | 41.26 | CDB-loss |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 38.5 | CDB-loss (ResNet 10) |
| Long-tail Learning | CIFAR-100-LT (ρ=100) | Error Rate | 57.43 | CDB-loss |
| Long-tail Learning | EGTEA | Average Precision | 63.86 | CDB-loss (3D- ResNeXt101) |
| Long-tail Learning | EGTEA | Average Recall | 66.24 | CDB-loss (3D- ResNeXt101) |
| Generalized Few-Shot Learning | CIFAR-100-LT (ρ=10) | Error Rate | 41.26 | CDB-loss |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 38.5 | CDB-loss (ResNet 10) |
| Generalized Few-Shot Learning | CIFAR-100-LT (ρ=100) | Error Rate | 57.43 | CDB-loss |
| Generalized Few-Shot Learning | EGTEA | Average Precision | 63.86 | CDB-loss (3D- ResNeXt101) |
| Generalized Few-Shot Learning | EGTEA | Average Recall | 66.24 | CDB-loss (3D- ResNeXt101) |