TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rethinking Pre-training and Self-training

Rethinking Pre-training and Self-training

Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

2020-06-11NeurIPS 2020 12Data AugmentationSegmentationSemantic Segmentationobject-detectionObject Detection
PaperPDFCodeCode

Abstract

Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same setup and contrast it against ImageNet pre-training. Our study reveals the generality and flexibility of self-training with three additional insights: 1) stronger data augmentation and more labeled data further diminish the value of pre-training, 2) unlike pre-training, self-training is always helpful when using stronger data augmentation, in both low-data and high-data regimes, and 3) in the case that pre-training is helpful, self-training improves upon pre-training. For example, on the COCO object detection dataset, pre-training benefits when we use one fifth of the labeled data, and hurts accuracy when we use all labeled data. Self-training, on the other hand, shows positive improvements from +1.3 to +3.4AP across all dataset sizes. In other words, self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO). On the PASCAL segmentation dataset, which is a much smaller dataset than COCO, though pre-training does help significantly, self-training improves upon the pre-trained model. On COCO object detection, we achieve 54.3AP, an improvement of +1.5AP over the strongest SpineNet model. On PASCAL segmentation, we achieve 90.5 mIOU, an improvement of +1.5% mIOU over the previous state-of-the-art result by DeepLabv3+.

Results

TaskDatasetMetricValueModel
Object DetectionCOCO test-devbox mAP54.3SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
Object DetectionCOCO minivalbox AP54.2SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
3DCOCO test-devbox mAP54.3SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
3DCOCO minivalbox AP54.2SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
2D ClassificationCOCO test-devbox mAP54.3SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
2D ClassificationCOCO minivalbox AP54.2SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
2D Object DetectionCOCO test-devbox mAP54.3SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
2D Object DetectionCOCO minivalbox AP54.2SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
16kCOCO test-devbox mAP54.3SpineNet-190 (1280, with Self-training on OpenImages, single-scale)
16kCOCO minivalbox AP54.2SpineNet-190 (1280, with Self-training on OpenImages, single-scale)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17