Gregory Holste, Song Wang, Ziyu Jiang, Thomas C. Shen, George Shih, Ronald M. Summers, Yifan Peng, Zhangyang Wang
Imaging exams, such as chest radiography, will yield a small set of common findings and a much larger set of uncommon findings. While a trained radiologist can learn the visual presentation of rare conditions by studying a few representative examples, teaching a machine to learn from such a "long-tailed" distribution is much more difficult, as standard methods would be easily biased toward the most frequent classes. In this paper, we present a comprehensive benchmark study of the long-tailed learning problem in the specific domain of thorax diseases on chest X-rays. We focus on learning from naturally distributed chest X-ray data, optimizing classification accuracy over not only the common "head" classes, but also the rare yet critical "tail" classes. To accomplish this, we introduce a challenging new long-tailed chest X-ray benchmark to facilitate research on developing long-tailed learning methods for medical image classification. The benchmark consists of two chest X-ray datasets for 19- and 20-way thorax disease classification, containing classes with as many as 53,000 and as few as 7 labeled training images. We evaluate both standard and state-of-the-art long-tailed learning methods on this new benchmark, analyzing which aspects of these methods are most beneficial for long-tailed medical image classification and summarizing insights for future algorithm design. The datasets, trained models, and code are available at https://github.com/VITA-Group/LongTailCXR.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.294 | Decoupling (cRT) |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.289 | Reweighted LDAM-DRW |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.281 | Class-balanced LDAM-DRW |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.279 | Reweighted LDAM |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.269 | Class-Balanced Softmax |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.26 | Reweighted Softmax |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.235 | Class-balanced LDAM |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.232 | Class-Balanced Focal Loss |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.214 | Decoupling (tau-norm) |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.197 | Reweighted Focal Loss |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.178 | LDAM |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.155 | Balanced-MixUp |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.122 | Focal Loss |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.118 | MixUp |
| Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.115 | Softmax |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.296 | Decoupling (cRT) |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.275 | Reweighted LDAM-DRW |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.267 | Class-balanced LDAM-DRW |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.243 | Reweighted LDAM |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.239 | Reweighted Focal Loss |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.23 | Decoupling (tau-norm) |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.227 | Class-balanced Softmax |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.225 | Class-balanced LDAM |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.211 | Reweighted Softmax |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.191 | Class-balanced Focal Loss |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.176 | MixUp |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.172 | Focal Loss |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.169 | Softmax |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.168 | Balanced-MixUp |
| Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.165 | LDAM |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.294 | Decoupling (cRT) |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.289 | Reweighted LDAM-DRW |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.281 | Class-balanced LDAM-DRW |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.279 | Reweighted LDAM |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.269 | Class-Balanced Softmax |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.26 | Reweighted Softmax |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.235 | Class-balanced LDAM |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.232 | Class-Balanced Focal Loss |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.214 | Decoupling (tau-norm) |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.197 | Reweighted Focal Loss |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.178 | LDAM |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.155 | Balanced-MixUp |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.122 | Focal Loss |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.118 | MixUp |
| Few-Shot Image Classification | NIH-CXR-LT | Balanced Accuracy | 0.115 | Softmax |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.296 | Decoupling (cRT) |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.275 | Reweighted LDAM-DRW |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.267 | Class-balanced LDAM-DRW |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.243 | Reweighted LDAM |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.239 | Reweighted Focal Loss |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.23 | Decoupling (tau-norm) |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.227 | Class-balanced Softmax |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.225 | Class-balanced LDAM |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.211 | Reweighted Softmax |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.191 | Class-balanced Focal Loss |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.176 | MixUp |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.172 | Focal Loss |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.169 | Softmax |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.168 | Balanced-MixUp |
| Few-Shot Image Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.165 | LDAM |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.294 | Decoupling (cRT) |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.289 | Reweighted LDAM-DRW |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.281 | Class-balanced LDAM-DRW |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.279 | Reweighted LDAM |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.269 | Class-Balanced Softmax |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.26 | Reweighted Softmax |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.235 | Class-balanced LDAM |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.232 | Class-Balanced Focal Loss |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.214 | Decoupling (tau-norm) |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.197 | Reweighted Focal Loss |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.178 | LDAM |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.155 | Balanced-MixUp |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.122 | Focal Loss |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.118 | MixUp |
| Generalized Few-Shot Classification | NIH-CXR-LT | Balanced Accuracy | 0.115 | Softmax |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.296 | Decoupling (cRT) |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.275 | Reweighted LDAM-DRW |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.267 | Class-balanced LDAM-DRW |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.243 | Reweighted LDAM |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.239 | Reweighted Focal Loss |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.23 | Decoupling (tau-norm) |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.227 | Class-balanced Softmax |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.225 | Class-balanced LDAM |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.211 | Reweighted Softmax |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.191 | Class-balanced Focal Loss |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.176 | MixUp |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.172 | Focal Loss |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.169 | Softmax |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.168 | Balanced-MixUp |
| Generalized Few-Shot Classification | MIMIC-CXR-LT | Balanced Accuracy | 0.165 | LDAM |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.294 | Decoupling (cRT) |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.289 | Reweighted LDAM-DRW |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.281 | Class-balanced LDAM-DRW |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.279 | Reweighted LDAM |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.269 | Class-Balanced Softmax |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.26 | Reweighted Softmax |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.235 | Class-balanced LDAM |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.232 | Class-Balanced Focal Loss |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.214 | Decoupling (tau-norm) |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.197 | Reweighted Focal Loss |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.178 | LDAM |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.155 | Balanced-MixUp |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.122 | Focal Loss |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.118 | MixUp |
| Long-tail Learning | NIH-CXR-LT | Balanced Accuracy | 0.115 | Softmax |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.296 | Decoupling (cRT) |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.275 | Reweighted LDAM-DRW |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.267 | Class-balanced LDAM-DRW |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.243 | Reweighted LDAM |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.239 | Reweighted Focal Loss |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.23 | Decoupling (tau-norm) |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.227 | Class-balanced Softmax |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.225 | Class-balanced LDAM |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.211 | Reweighted Softmax |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.191 | Class-balanced Focal Loss |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.176 | MixUp |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.172 | Focal Loss |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.169 | Softmax |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.168 | Balanced-MixUp |
| Long-tail Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.165 | LDAM |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.294 | Decoupling (cRT) |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.289 | Reweighted LDAM-DRW |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.281 | Class-balanced LDAM-DRW |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.279 | Reweighted LDAM |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.269 | Class-Balanced Softmax |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.26 | Reweighted Softmax |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.235 | Class-balanced LDAM |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.232 | Class-Balanced Focal Loss |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.214 | Decoupling (tau-norm) |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.197 | Reweighted Focal Loss |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.178 | LDAM |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.155 | Balanced-MixUp |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.122 | Focal Loss |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.118 | MixUp |
| Generalized Few-Shot Learning | NIH-CXR-LT | Balanced Accuracy | 0.115 | Softmax |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.296 | Decoupling (cRT) |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.275 | Reweighted LDAM-DRW |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.267 | Class-balanced LDAM-DRW |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.243 | Reweighted LDAM |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.239 | Reweighted Focal Loss |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.23 | Decoupling (tau-norm) |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.227 | Class-balanced Softmax |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.225 | Class-balanced LDAM |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.211 | Reweighted Softmax |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.191 | Class-balanced Focal Loss |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.176 | MixUp |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.172 | Focal Loss |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.169 | Softmax |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.168 | Balanced-MixUp |
| Generalized Few-Shot Learning | MIMIC-CXR-LT | Balanced Accuracy | 0.165 | LDAM |