TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Contrastive Learning Rivals Masked Image Modeling in Fine-...

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

2022-05-27Image ClassificationSelf-Supervised LearningSemantic SegmentationContrastive LearningInstance Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Masked image modeling (MIM) learns representations with remarkably good fine-tuning performances, overshadowing previous prevalent pre-training approaches such as image classification, instance contrastive learning, and image-text alignment. In this paper, we show that the inferior fine-tuning performance of these pre-training approaches can be significantly improved by a simple post-processing in the form of feature distillation (FD). The feature distillation converts the old representations to new representations that have a few desirable properties just like those representations produced by MIM. These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools. With these properties, the new representations show strong fine-tuning performance. Specifically, the contrastive self-supervised learning methods are made as competitive in fine-tuning as the state-of-the-art masked image modeling (MIM) algorithms. The CLIP models' fine-tuning performance is also significantly improved, with a CLIP ViT-L model reaching 89.0% top-1 accuracy on ImageNet-1K classification. On the 3-billion-parameter SwinV2-G model, the fine-tuning accuracy is improved by +1.5 mIoU / +1.1 mAP to 61.4 mIoU / 64.2 mAP on ADE20K semantic segmentation and COCO object detection, respectively, creating new records on both benchmarks. More importantly, our work provides a way for the future research to focus more effort on the generality and scalability of the learnt representations without being pre-occupied with optimization friendliness since it can be enhanced rather easily. The code will be available at https://github.com/SwinTransformer/Feature-Distillation.

Results

TaskDatasetMetricValueModel
Semantic SegmentationADE20K valmIoU61.4FD-SwinV2-G
Semantic SegmentationADE20KParams (M)3000FD-SwinV2-G
Semantic SegmentationADE20KValidation mIoU61.4FD-SwinV2-G
Object DetectionCOCO test-devbox mAP64.2FD-SwinV2-G
3DCOCO test-devbox mAP64.2FD-SwinV2-G
Instance SegmentationCOCO test-devmask AP55.4FD-SwinV2-G
2D ClassificationCOCO test-devbox mAP64.2FD-SwinV2-G
2D Object DetectionCOCO test-devbox mAP64.2FD-SwinV2-G
10-shot image generationADE20K valmIoU61.4FD-SwinV2-G
10-shot image generationADE20KParams (M)3000FD-SwinV2-G
10-shot image generationADE20KValidation mIoU61.4FD-SwinV2-G
16kCOCO test-devbox mAP64.2FD-SwinV2-G

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17