Saptarshi Sinha, Hiroki Ohashi
Long-tailed datasets, where head classes comprise much more training samples than tail classes, cause recognition models to get biased towards the head classes. Weighted loss is one of the most popular ways of mitigating this issue, and a recent work has suggested that class-difficulty might be a better clue than conventionally used class-frequency to decide the distribution of weights. A heuristic formulation was used in the previous work for quantifying the difficulty, but we empirically find that the optimal formulation varies depending on the characteristics of datasets. Therefore, we propose Difficulty-Net, which learns to predict the difficulty of classes using the model's performance in a meta-learning framework. To make it learn reasonable difficulty of a class within the context of other classes, we newly introduce two key concepts, namely the relative difficulty and the driver loss. The former helps Difficulty-Net take other classes into account when calculating difficulty of a class, while the latter is indispensable for guiding the learning to a meaningful direction. Extensive experiments on popular long-tailed datasets demonstrated the effectiveness of the proposed method, and it achieved state-of-the-art performance on multiple long-tailed datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | Places-LT | Top-1 Accuracy | 41.7 | Difficulty-Net (ResNet-152) |
| Image Classification | CIFAR-100-LT (ρ=50) | Error Rate | 43.1 | Difficulty-Net |
| Image Classification | CIFAR-100-LT (ρ=10) | Error Rate | 34.78 | Difficulty-Net |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 57.4 | Difficulty-Net (ResNet-50 using RandAugment, single model) |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 54 | Difficulty-Net (ResNet-50 w/o using RandAugment, single model) |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 44.6 | Difficulty-Net (ResNet-10 w/o using RandAugment, single model |
| Image Classification | CIFAR-100-LT (ρ=100) | Error Rate | 47.04 | Difficulty-Net |
| Few-Shot Image Classification | Places-LT | Top-1 Accuracy | 41.7 | Difficulty-Net (ResNet-152) |
| Few-Shot Image Classification | CIFAR-100-LT (ρ=50) | Error Rate | 43.1 | Difficulty-Net |
| Few-Shot Image Classification | CIFAR-100-LT (ρ=10) | Error Rate | 34.78 | Difficulty-Net |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 57.4 | Difficulty-Net (ResNet-50 using RandAugment, single model) |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 54 | Difficulty-Net (ResNet-50 w/o using RandAugment, single model) |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 44.6 | Difficulty-Net (ResNet-10 w/o using RandAugment, single model |
| Few-Shot Image Classification | CIFAR-100-LT (ρ=100) | Error Rate | 47.04 | Difficulty-Net |
| Generalized Few-Shot Classification | Places-LT | Top-1 Accuracy | 41.7 | Difficulty-Net (ResNet-152) |
| Generalized Few-Shot Classification | CIFAR-100-LT (ρ=50) | Error Rate | 43.1 | Difficulty-Net |
| Generalized Few-Shot Classification | CIFAR-100-LT (ρ=10) | Error Rate | 34.78 | Difficulty-Net |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 57.4 | Difficulty-Net (ResNet-50 using RandAugment, single model) |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 54 | Difficulty-Net (ResNet-50 w/o using RandAugment, single model) |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 44.6 | Difficulty-Net (ResNet-10 w/o using RandAugment, single model |
| Generalized Few-Shot Classification | CIFAR-100-LT (ρ=100) | Error Rate | 47.04 | Difficulty-Net |
| Long-tail Learning | Places-LT | Top-1 Accuracy | 41.7 | Difficulty-Net (ResNet-152) |
| Long-tail Learning | CIFAR-100-LT (ρ=50) | Error Rate | 43.1 | Difficulty-Net |
| Long-tail Learning | CIFAR-100-LT (ρ=10) | Error Rate | 34.78 | Difficulty-Net |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 57.4 | Difficulty-Net (ResNet-50 using RandAugment, single model) |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 54 | Difficulty-Net (ResNet-50 w/o using RandAugment, single model) |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 44.6 | Difficulty-Net (ResNet-10 w/o using RandAugment, single model |
| Long-tail Learning | CIFAR-100-LT (ρ=100) | Error Rate | 47.04 | Difficulty-Net |
| Generalized Few-Shot Learning | Places-LT | Top-1 Accuracy | 41.7 | Difficulty-Net (ResNet-152) |
| Generalized Few-Shot Learning | CIFAR-100-LT (ρ=50) | Error Rate | 43.1 | Difficulty-Net |
| Generalized Few-Shot Learning | CIFAR-100-LT (ρ=10) | Error Rate | 34.78 | Difficulty-Net |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 57.4 | Difficulty-Net (ResNet-50 using RandAugment, single model) |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 54 | Difficulty-Net (ResNet-50 w/o using RandAugment, single model) |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 44.6 | Difficulty-Net (ResNet-10 w/o using RandAugment, single model |
| Generalized Few-Shot Learning | CIFAR-100-LT (ρ=100) | Error Rate | 47.04 | Difficulty-Net |