Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem. Existing solutions usually involve class-balancing strategies, e.g., by loss re-weighting, data re-sampling, or transfer learning from head- to tail-classes, but most of them adhere to the scheme of jointly learning representations and classifiers. In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. The findings are surprising: (1) data imbalance might not be an issue in learning high-quality representations; (2) with representations learned with the simplest instance-balanced (natural) sampling, it is also possible to achieve strong long-tailed recognition ability by adjusting only the classifier. We conduct extensive experiments and set new state-of-the-art performance on common long-tailed benchmarks like ImageNet-LT, Places-LT and iNaturalist, showing that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification. Our code is available at https://github.com/facebookresearch/classifier-balancing.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | Places-LT | Top-1 Accuracy | 37.6 | CB LWS |
| Image Classification | CIFAR-10-LT (ρ=10) | Error Rate | 8.9 | LWS |
| Image Classification | CIFAR-10-LT (ρ=10) | Error Rate | 9 | cRT |
| Image Classification | ImageNet-LT | Top-1 Accuracy | 41.4 | CB LWS |
| Image Classification | CUB-LT | Long-Tailed Accuracy | 65.7 | LWS |
| Image Classification | CUB-LT | Per-Class Accuracy | 53.1 | LWS |
| Image Classification | AWA-LT | Long-Tailed Accuracy | 93.5 | LWS |
| Image Classification | AWA-LT | Per-Class Accuracy | 73.4 | LWS |
| Image Classification | SUN-LT | Long-Tailed Accuracy | 40.2 | LWS |
| Image Classification | SUN-LT | Per-Class Accuracy | 33.9 | LWS |
| Image Classification | ImageNet-LT-d | Per-Class Accuracy | 49.9 | LWS |
| Few-Shot Image Classification | Places-LT | Top-1 Accuracy | 37.6 | CB LWS |
| Few-Shot Image Classification | CIFAR-10-LT (ρ=10) | Error Rate | 8.9 | LWS |
| Few-Shot Image Classification | CIFAR-10-LT (ρ=10) | Error Rate | 9 | cRT |
| Few-Shot Image Classification | ImageNet-LT | Top-1 Accuracy | 41.4 | CB LWS |
| Few-Shot Image Classification | CUB-LT | Long-Tailed Accuracy | 65.7 | LWS |
| Few-Shot Image Classification | CUB-LT | Per-Class Accuracy | 53.1 | LWS |
| Few-Shot Image Classification | AWA-LT | Long-Tailed Accuracy | 93.5 | LWS |
| Few-Shot Image Classification | AWA-LT | Per-Class Accuracy | 73.4 | LWS |
| Few-Shot Image Classification | SUN-LT | Long-Tailed Accuracy | 40.2 | LWS |
| Few-Shot Image Classification | SUN-LT | Per-Class Accuracy | 33.9 | LWS |
| Few-Shot Image Classification | ImageNet-LT-d | Per-Class Accuracy | 49.9 | LWS |
| Generalized Few-Shot Classification | Places-LT | Top-1 Accuracy | 37.6 | CB LWS |
| Generalized Few-Shot Classification | CIFAR-10-LT (ρ=10) | Error Rate | 8.9 | LWS |
| Generalized Few-Shot Classification | CIFAR-10-LT (ρ=10) | Error Rate | 9 | cRT |
| Generalized Few-Shot Classification | ImageNet-LT | Top-1 Accuracy | 41.4 | CB LWS |
| Generalized Few-Shot Classification | CUB-LT | Long-Tailed Accuracy | 65.7 | LWS |
| Generalized Few-Shot Classification | CUB-LT | Per-Class Accuracy | 53.1 | LWS |
| Generalized Few-Shot Classification | AWA-LT | Long-Tailed Accuracy | 93.5 | LWS |
| Generalized Few-Shot Classification | AWA-LT | Per-Class Accuracy | 73.4 | LWS |
| Generalized Few-Shot Classification | SUN-LT | Long-Tailed Accuracy | 40.2 | LWS |
| Generalized Few-Shot Classification | SUN-LT | Per-Class Accuracy | 33.9 | LWS |
| Generalized Few-Shot Classification | ImageNet-LT-d | Per-Class Accuracy | 49.9 | LWS |
| Long-tail Learning | Places-LT | Top-1 Accuracy | 37.6 | CB LWS |
| Long-tail Learning | CIFAR-10-LT (ρ=10) | Error Rate | 8.9 | LWS |
| Long-tail Learning | CIFAR-10-LT (ρ=10) | Error Rate | 9 | cRT |
| Long-tail Learning | ImageNet-LT | Top-1 Accuracy | 41.4 | CB LWS |
| Long-tail Learning | CUB-LT | Long-Tailed Accuracy | 65.7 | LWS |
| Long-tail Learning | CUB-LT | Per-Class Accuracy | 53.1 | LWS |
| Long-tail Learning | AWA-LT | Long-Tailed Accuracy | 93.5 | LWS |
| Long-tail Learning | AWA-LT | Per-Class Accuracy | 73.4 | LWS |
| Long-tail Learning | SUN-LT | Long-Tailed Accuracy | 40.2 | LWS |
| Long-tail Learning | SUN-LT | Per-Class Accuracy | 33.9 | LWS |
| Long-tail Learning | ImageNet-LT-d | Per-Class Accuracy | 49.9 | LWS |
| Generalized Few-Shot Learning | Places-LT | Top-1 Accuracy | 37.6 | CB LWS |
| Generalized Few-Shot Learning | CIFAR-10-LT (ρ=10) | Error Rate | 8.9 | LWS |
| Generalized Few-Shot Learning | CIFAR-10-LT (ρ=10) | Error Rate | 9 | cRT |
| Generalized Few-Shot Learning | ImageNet-LT | Top-1 Accuracy | 41.4 | CB LWS |
| Generalized Few-Shot Learning | CUB-LT | Long-Tailed Accuracy | 65.7 | LWS |
| Generalized Few-Shot Learning | CUB-LT | Per-Class Accuracy | 53.1 | LWS |
| Generalized Few-Shot Learning | AWA-LT | Long-Tailed Accuracy | 93.5 | LWS |
| Generalized Few-Shot Learning | AWA-LT | Per-Class Accuracy | 73.4 | LWS |
| Generalized Few-Shot Learning | SUN-LT | Long-Tailed Accuracy | 40.2 | LWS |
| Generalized Few-Shot Learning | SUN-LT | Per-Class Accuracy | 33.9 | LWS |
| Generalized Few-Shot Learning | ImageNet-LT-d | Per-Class Accuracy | 49.9 | LWS |