Towards Sustainable Self-supervised Learning

ShangHua Gao, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan

2022-10-20Self-Supervised Image Classification Self-Supervised Learning Semantic Segmentation Object Detection

Abstract

Although increasingly training-expensive, most self-supervised learning (SSL) models have repeatedly been trained from scratch but not fully utilized, since only a few SOTAs are employed for downstream tasks. In this work, we explore a sustainable SSL framework with two major challenges: i) learning a stronger new SSL model based on the existing pretrained SSL model, also called as "base" model, in a cost-friendly manner, ii) allowing the training of the new model to be compatible with various base models. We propose a Target-Enhanced Conditional (TEC) scheme which introduces two components to the existing mask-reconstruction based SSL. Firstly, we propose patch-relation enhanced targets which enhances the target given by base model and encourages the new model to learn semantic-relation knowledge from the base model by using incomplete inputs. This hardening and target-enhancing help the new model surpass the base model, since they enforce additional patch relation modeling to handle incomplete input. Secondly, we introduce a conditional adapter that adaptively adjusts new model prediction to align with the target of different base models. Extensive experimental results show that our TEC scheme can accelerate the learning speed, and also improve SOTA SSL base models, e.g., MAE and iBOT, taking an explorative step towards sustainable SSL.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	ImageNet-S	mIoU (test)	62.5	TEC (ViT-B/16, 224x224, SSL+FT, mmseg)
Semantic Segmentation	ImageNet-S	mIoU (val)	63.2	TEC (ViT-B/16, 224x224, SSL+FT, mmseg)
Semantic Segmentation	ImageNet-S	mIoU (val)	62	TEC (ViT-B/16, 224x224, SSL+FT)
Semantic Segmentation	ImageNet-S	mIoU (test)	46	TEC (ViT-B/16, 224x224, SSL, mmseg)
Semantic Segmentation	ImageNet-S	mIoU (val)	46.1	TEC (ViT-B/16, 224x224, SSL, mmseg)
Semantic Segmentation	ImageNet-S	mIoU (val)	42.9	TEC (ViT-B/16, 224x224, SSL)
Semantic Segmentation	ADE20K	Validation mIoU	51	TEC (Vit-B, Upernet)
Object Detection	COCO minival	box AP	54.6	TEC(VIT-B, Mask-RCNN)
3D	COCO minival	box AP	54.6	TEC(VIT-B, Mask-RCNN)
2D Classification	COCO minival	box AP	54.6	TEC(VIT-B, Mask-RCNN)
2D Object Detection	COCO minival	box AP	54.6	TEC(VIT-B, Mask-RCNN)
10-shot image generation	ImageNet-S	mIoU (test)	62.5	TEC (ViT-B/16, 224x224, SSL+FT, mmseg)
10-shot image generation	ImageNet-S	mIoU (val)	63.2	TEC (ViT-B/16, 224x224, SSL+FT, mmseg)
10-shot image generation	ImageNet-S	mIoU (val)	62	TEC (ViT-B/16, 224x224, SSL+FT)
10-shot image generation	ImageNet-S	mIoU (test)	46	TEC (ViT-B/16, 224x224, SSL, mmseg)
10-shot image generation	ImageNet-S	mIoU (val)	46.1	TEC (ViT-B/16, 224x224, SSL, mmseg)
10-shot image generation	ImageNet-S	mIoU (val)	42.9	TEC (ViT-B/16, 224x224, SSL)
10-shot image generation	ADE20K	Validation mIoU	51	TEC (Vit-B, Upernet)
16k	COCO minival	box AP	54.6	TEC(VIT-B, Mask-RCNN)

Towards Sustainable Self-supervised Learning

Abstract

Results

Related Papers

Towards Sustainable Self-supervised Learning

Abstract

Results

Related Papers