Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

Weihua Chen, Xianzhe Xu, Jian Jia, Hao Luo, Yaohua Wang, Fan Wang, Rong Jin, Xiuyu Sun

2023-03-30CVPR 2023 1Pedestrian Attribute Recognition Person Search Self-Supervised Learning Human Parsing Semantic Segmentation Pose Estimation Person Re-Identification Pedestrian Detection

Paper PDF Code Code Code Code(official)

Abstract

Human-centric visual tasks have attracted increasing research attention due to their widespread applications. In this paper, we aim to learn a general human representation from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. We call this method SOLIDER, a Semantic cOntrollable seLf-supervIseD lEaRning framework. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, we note that different downstream tasks always require different ratios of semantic information and appearance information. For example, human parsing requires more semantic information, while person re-identification needs more appearance information for identification purpose. So a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller. After the model is trained, users can send values to the controller to produce representations with different ratios of semantic information, which can fit different needs of downstream tasks. Finally, SOLIDER is verified on six downstream human-centric visual tasks. It outperforms state of the arts and builds new baselines for these tasks. The code is released in https://github.com/tinyvision/SOLIDER.

Results

Task	Dataset	Metric	Value	Model
Autonomous Vehicles	CityPersons	Heavy MR^-2	39.4	SOLIDER
Autonomous Vehicles	CityPersons	Reasonable MR^-2	9.7	SOLIDER
Autonomous Vehicles	PA-100K	Accuracy	86.38	SOLIDER
Person Search	CUHK-SYSU	MAP	95.5	SOLIDER
Person Search	CUHK-SYSU	Top-1	95.8	SOLIDER
Person Search	PRW	Top-1	86.7	SOLIDER
Person Search	PRW	mAP	59.8	SOLIDER
Person Re-Identification	MSMT17	Rank-1	91.7	SOLIDER (with re-ranking)
Person Re-Identification	MSMT17	mAP	86.5	SOLIDER (with re-ranking)
Person Re-Identification	MSMT17	Rank-1	90.7	SOLIDER (without re-ranking)
Person Re-Identification	MSMT17	mAP	77.1	SOLIDER (without re-ranking)
Person Re-Identification	Market-1501	Rank-1	96.9	SOLIDER
Person Re-Identification	Market-1501	mAP	93.9	SOLIDER
Person Re-Identification	Market-1501	Rank-1	96.7	SOLIDER (RK)
Person Re-Identification	Market-1501	mAP	95.6	SOLIDER (RK)
Person Re-Identification	Occluded-DukeMTMC	Rank-1	71.2	SOLIDER
Person Re-Identification	Occluded-DukeMTMC	mAP	61.9	SOLIDER
Pose Estimation	COCO (Common Objects in Context)	AP	76.6	SOLIDER (swin-B)
Pose Estimation	COCO (Common Objects in Context)	AR	81.5	SOLIDER (swin-B)
Pedestrian Attribute Recognition	PA-100K	Accuracy	86.38	SOLIDER
3D	COCO (Common Objects in Context)	AP	76.6	SOLIDER (swin-B)
3D	COCO (Common Objects in Context)	AR	81.5	SOLIDER (swin-B)
Pedestrian Detection	CityPersons	Heavy MR^-2	39.4	SOLIDER
Pedestrian Detection	CityPersons	Reasonable MR^-2	9.7	SOLIDER
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	AP	76.6	SOLIDER (swin-B)
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	AR	81.5	SOLIDER (swin-B)

Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

Abstract

Results

Related Papers

Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

Abstract

Results

Related Papers