Jiequan Cui, Shu Liu, Zhuotao Tian, Zhisheng Zhong, Jiaya Jia
Deep learning algorithms face great challenges with long-tailed data distribution which, however, is quite a common case in real-world scenarios. Previous methods tackle the problem from either the aspect of input space (re-sampling classes with different frequencies) or loss space (re-weighting classes with different weights), suffering from heavy over-fitting to tail classes or hard optimization during training. To alleviate these issues, we propose a more fundamental perspective for long-tailed recognition, i.e., from the aspect of parameter space, and aims to preserve specific capacity for classes with low frequencies. From this perspective, the trivial solution utilizes different branches for the head, medium, and tail classes respectively, and then sums their outputs as the final results is not feasible. Instead, we design the effective residual fusion mechanism -- with one main branch optimized to recognize images from all classes, another two residual branches are gradually fused and optimized to enhance images from medium+tail classes and tail classes respectively. Then the branches are aggregated into final results by additive shortcuts. We test our method on several benchmarks, i.e., long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist 2018. Experimental results manifest the effectiveness of our method. Our code is available at https://github.com/jiequancui/ResLT.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | CIFAR-10-LT (ρ=10) | Error Rate | 10.3 | ResLT |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 57.6 | ResLT(ResNeXt-50-3 experts) |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 55.1 | ResLT(ResNeXt101-32x4d) |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 52.9 | ResLT(ResNeXt50) |
| Few-Shot Image Classification | CIFAR-10-LT (ρ=10) | Error Rate | 10.3 | ResLT |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 57.6 | ResLT(ResNeXt-50-3 experts) |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 55.1 | ResLT(ResNeXt101-32x4d) |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 52.9 | ResLT(ResNeXt50) |
| Generalized Few-Shot Classification | CIFAR-10-LT (ρ=10) | Error Rate | 10.3 | ResLT |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 57.6 | ResLT(ResNeXt-50-3 experts) |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 55.1 | ResLT(ResNeXt101-32x4d) |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 52.9 | ResLT(ResNeXt50) |
| Long-tail Learning | CIFAR-10-LT (ρ=10) | Error Rate | 10.3 | ResLT |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 57.6 | ResLT(ResNeXt-50-3 experts) |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 55.1 | ResLT(ResNeXt101-32x4d) |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 52.9 | ResLT(ResNeXt50) |
| Generalized Few-Shot Learning | CIFAR-10-LT (ρ=10) | Error Rate | 10.3 | ResLT |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 57.6 | ResLT(ResNeXt-50-3 experts) |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 55.1 | ResLT(ResNeXt101-32x4d) |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 52.9 | ResLT(ResNeXt50) |