A Simple Semi-Supervised Learning Framework for Object Detection

Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister

2020-05-10Image Classification Data Augmentation object-detection Object Detection Semi-Supervised Object Detection

Paper PDF Code Code Code Code Code Code(official)Code

Abstract

Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data. Although there has been remarkable recent progress, the scope of demonstration in SSL has mainly been on image classification tasks. In this paper, we propose STAC, a simple yet effective SSL framework for visual object detection along with a data augmentation strategy. STAC deploys highly confident pseudo labels of localized objects from an unlabeled image and updates the model by enforcing consistency via strong augmentations. We propose experimental protocols to evaluate the performance of semi-supervised object detection using MS-COCO and show the efficacy of STAC on both MS-COCO and VOC07. On VOC07, STAC improves the AP$^{0.5}$ from $76.30$ to $79.08$; on MS-COCO, STAC demonstrates $2{\times}$ higher data efficiency by achieving 24.38 mAP using only 5\% labeled data than supervised baseline that marks 23.86\% using 10\% labeled data. The code is available at https://github.com/google-research/ssl_detection/.

Results

Task	Dataset	Metric	Value	Model
Semi-Supervised Object Detection	COCO 100% labeled data	mAP	39.2	STAC
2D Object Detection	COCO 100% labeled data	mAP	39.2	STAC

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17