Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Dollár, Laurens van der Maaten

2022-01-20CVPR 2022 1Image Classification Self-Supervised Learning Transfer Learning Out-of-Distribution Generalization Fine-Grained Image Classification

Paper PDF Code(official)Code

Abstract

Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outperform fully supervised approaches. This paper revisits weakly-supervised pre-training of models using hashtag supervision with modern versions of residual networks and the largest-ever dataset of images and corresponding hashtags. We study the performance of the resulting models in various transfer-learning settings including zero-shot transfer. We also compare our models with those obtained via large-scale self-supervised learning. We find our weakly-supervised models to be very competitive across all settings, and find they substantially outperform their self-supervised counterparts. We also include an investigation into whether our models learned potentially troubling associations or stereotypes. Overall, our results provide a compelling argument for the use of weakly supervised learning in the development of visual recognition systems. Our models, Supervised Weakly through hashtAGs (SWAG), are available publicly.

Results

Task	Dataset	Metric	Value	Model
Image Classification	ImageNet V2	Top 1 Accuracy	81.1	SWAG (ViT H/14)
Image Classification	Places365-Standard	Top 1 Accuracy	60.7	SWAG (ViT H/14)
Image Classification	ObjectNet	Top-1 Accuracy	69.5	SWAG (ViT H/14)
Image Classification	ObjectNet	Top-1 Accuracy	64.3	RegNetY 128GF (Platt)
Image Classification	ObjectNet	Top-1 Accuracy	60	ViT H/14 (Platt)
Image Classification	ObjectNet	Top-1 Accuracy	57.3	ViT L/16 (Platt)
Image Classification	ObjectNet	Top-1 Accuracy	48.9	ViT B/16
Image Classification	ImageNet	GFLOPs	1018.8	SWAG (ViT H/14)
Image Classification	CUB-200-2011	Accuracy	91.7	SWAG (ViT H/14)
Fine-Grained Image Classification	CUB-200-2011	Accuracy	91.7	SWAG (ViT H/14)

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Abstract

Results

Related Papers

Revisiting Weakly Supervised Pre-Training of Visual Perception Models

Abstract

Results

Related Papers