TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Beyond Appearance: a Semantic Controllable Self-Supervised...

Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks

Weihua Chen, Xianzhe Xu, Jian Jia, Hao Luo, Yaohua Wang, Fan Wang, Rong Jin, Xiuyu Sun

2023-03-30CVPR 2023 1Pedestrian Attribute RecognitionPerson SearchSelf-Supervised LearningHuman ParsingSemantic SegmentationPose EstimationPerson Re-IdentificationPedestrian Detection
PaperPDFCodeCodeCodeCode(official)

Abstract

Human-centric visual tasks have attracted increasing research attention due to their widespread applications. In this paper, we aim to learn a general human representation from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. We call this method SOLIDER, a Semantic cOntrollable seLf-supervIseD lEaRning framework. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, we note that different downstream tasks always require different ratios of semantic information and appearance information. For example, human parsing requires more semantic information, while person re-identification needs more appearance information for identification purpose. So a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller. After the model is trained, users can send values to the controller to produce representations with different ratios of semantic information, which can fit different needs of downstream tasks. Finally, SOLIDER is verified on six downstream human-centric visual tasks. It outperforms state of the arts and builds new baselines for these tasks. The code is released in https://github.com/tinyvision/SOLIDER.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesCityPersonsHeavy MR^-239.4SOLIDER
Autonomous VehiclesCityPersonsReasonable MR^-29.7SOLIDER
Autonomous VehiclesPA-100KAccuracy86.38SOLIDER
Person SearchCUHK-SYSUMAP95.5SOLIDER
Person SearchCUHK-SYSUTop-195.8SOLIDER
Person SearchPRWTop-186.7SOLIDER
Person SearchPRWmAP59.8SOLIDER
Person Re-IdentificationMSMT17Rank-191.7SOLIDER (with re-ranking)
Person Re-IdentificationMSMT17mAP86.5SOLIDER (with re-ranking)
Person Re-IdentificationMSMT17Rank-190.7SOLIDER (without re-ranking)
Person Re-IdentificationMSMT17mAP77.1SOLIDER (without re-ranking)
Person Re-IdentificationMarket-1501Rank-196.9SOLIDER
Person Re-IdentificationMarket-1501mAP93.9SOLIDER
Person Re-IdentificationMarket-1501Rank-196.7SOLIDER (RK)
Person Re-IdentificationMarket-1501mAP95.6SOLIDER (RK)
Person Re-IdentificationOccluded-DukeMTMC Rank-171.2SOLIDER
Person Re-IdentificationOccluded-DukeMTMCmAP61.9SOLIDER
Pose EstimationCOCO (Common Objects in Context)AP76.6SOLIDER (swin-B)
Pose EstimationCOCO (Common Objects in Context)AR81.5SOLIDER (swin-B)
Pedestrian Attribute RecognitionPA-100KAccuracy86.38SOLIDER
3DCOCO (Common Objects in Context)AP76.6SOLIDER (swin-B)
3DCOCO (Common Objects in Context)AR81.5SOLIDER (swin-B)
Pedestrian DetectionCityPersonsHeavy MR^-239.4SOLIDER
Pedestrian DetectionCityPersonsReasonable MR^-29.7SOLIDER
1 Image, 2*2 StitchiCOCO (Common Objects in Context)AP76.6SOLIDER (swin-B)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)AR81.5SOLIDER (swin-B)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17