Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions

Fei Du, Peng Yang, Qi Jia, Fengtao Nan, Xiaoting Chen, Yun Yang

2023-05-15CVPR 2023 1Long-tail Learning

Abstract

In this paper, our goal is to design a simple learning paradigm for long-tail visual recognition, which not only improves the robustness of the feature extractor but also alleviates the bias of the classifier towards head classes while reducing the training skills and overhead. We propose an efficient one-stage training strategy for long-tailed visual recognition called Global and Local Mixture Consistency cumulative learning (GLMC). Our core ideas are twofold: (1) a global and local mixture consistency loss improves the robustness of the feature extractor. Specifically, we generate two augmented batches by the global MixUp and local CutMix from the same batch data, respectively, and then use cosine similarity to minimize the difference. (2) A cumulative head tail soft label reweighted loss mitigates the head class bias problem. We use empirical class frequencies to reweight the mixed label of the head-tail class for long-tailed data and then balance the conventional loss and the rebalanced loss with a coefficient accumulated by epochs. Our approach achieves state-of-the-art accuracy on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT datasets. Additional experiments on balanced ImageNet and CIFAR demonstrate that GLMC can significantly improve the generalization of backbones. Code is made publicly available at https://github.com/ynu-yangpeng/GLMC.

Results

Task	Dataset	Metric	Value	Model
Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Image Classification	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Image Classification	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Few-Shot Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Long-tail Learning	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Long-tail Learning	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)

Abstract

Results

Task	Dataset	Metric	Value	Model
Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Image Classification	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Image Classification	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Few-Shot Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Few-Shot Image Classification	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Classification	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Long-tail Learning	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Long-tail Learning	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Long-tail Learning	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Long-tail Learning	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=10)	Error Rate	5	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=10)	Error Rate	5.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=50)	Error Rate	36.15	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=10)	Error Rate	25.72	GLMC+MaxNorm (ResNet-32, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=10)	Error Rate	26.53	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	ImageNet-LT	Top-1 Accuracy	56.3	GLMC (ResNeXt-50)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=100)	Error Rate	41.59	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-100-LT (ρ=100)	Error Rate	42.01	GLMC (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=100)	Error Rate	10.42	GLMC+MaxNorm (ResNet-34, channel x4)
Generalized Few-Shot Learning	CIFAR-10-LT (ρ=100)	Error Rate	11.5	GLMC (ResNet-34, channel x4)

Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions

Abstract

Results

Related Papers

Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions

Abstract

Results

Related Papers