TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/UniHCP: A Unified Model for Human-Centric Perceptions

UniHCP: A Unified Model for Human-Centric Perceptions

Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang

2023-03-06CVPR 2023 1Pedestrian Attribute RecognitionAttributeHuman Part SegmentationHuman ParsingSemantic SegmentationPose EstimationPerson Re-IdentificationPedestrian Detection2D Pose EstimationObject Detection
PaperPDFCode(official)

Abstract

Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian detection, person re-identification, etc.) play a key role in industrial applications of visual models. While specific human-centric tasks have their own relevant semantic aspect to focus on, they also share the same underlying semantic structure of the human body. However, few works have attempted to exploit such homogeneity and design a general-propose model for human-centric tasks. In this work, we revisit a broad range of human-centric tasks and unify them in a minimalist manner. We propose UniHCP, a Unified Model for Human-Centric Perceptions, which unifies a wide range of human-centric tasks in a simplified end-to-end manner with the plain vision transformer architecture. With large-scale joint training on 33 human-centric datasets, UniHCP can outperform strong baselines on several in-domain and downstream tasks by direct evaluation. When adapted to a specific task, UniHCP achieves new SOTAs on a wide range of human-centric tasks, e.g., 69.8 mIoU on CIHP for human parsing, 86.18 mA on PA-100K for attribute prediction, 90.3 mAP on Market1501 for ReID, and 85.8 JI on CrowdHuman for pedestrian detection, performing better than specialized models tailored for each task.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesCaltechHeavy MR^-227.2UniHCP (FT)
Autonomous VehiclesPA-100KAccuracy86.18UniHCP (finetune)
Autonomous VehiclesRAPv2Accuracy82.34UniHCP (finetune)
Person Re-IdentificationMSMT17mAP67.3UniHCP (finetune)
Person Re-IdentificationMarket-1501mAP90.3UniHCP (finetune)
Person Re-IdentificationSenseReIDTop-146UniHCP (DE)
Person Re-IdentificationCUHK03MAP83.1UniHCP (finetune)
Pose EstimationMS-COCOAP76.5UniHCP (finetune)
Pose EstimationOCHumanTest AP87.4UniHCP (direct eval)
Pose EstimationAICAP33.6UniHCP (finetune)
Pose EstimationMPII Human PosePCKh-0.593.2UniHCP (FT)
2D Pose EstimationHuman3.6MEPE6.6UniHCP (finetune)
Pedestrian Attribute RecognitionPA-100KAccuracy86.18UniHCP (finetune)
Pedestrian Attribute RecognitionRAPv2Accuracy82.34UniHCP (finetune)
Human Part SegmentationATRpACC97.74UniHCP (FT)
Human Part SegmentationHuman3.6MmIoU65.95UniHCP (finetune)
Human Part SegmentationCIHPMean IoU69.8UniHCP (finetune)
Object DetectionCrowdHuman (full body)AP92.5UniHCP (finetune)
Object DetectionCrowdHuman (full body)mMR41.6UniHCP (finetune)
3DCrowdHuman (full body)AP92.5UniHCP (finetune)
3DCrowdHuman (full body)mMR41.6UniHCP (finetune)
3DMS-COCOAP76.5UniHCP (finetune)
3DOCHumanTest AP87.4UniHCP (direct eval)
3DAICAP33.6UniHCP (finetune)
3DMPII Human PosePCKh-0.593.2UniHCP (FT)
2D Semantic SegmentationATRpACC97.74UniHCP (FT)
2D Semantic SegmentationHuman3.6MmIoU65.95UniHCP (finetune)
2D Semantic SegmentationCIHPMean IoU69.8UniHCP (finetune)
2D ClassificationCrowdHuman (full body)AP92.5UniHCP (finetune)
2D ClassificationCrowdHuman (full body)mMR41.6UniHCP (finetune)
2D ClassificationHuman3.6MEPE6.6UniHCP (finetune)
Pedestrian DetectionCaltechHeavy MR^-227.2UniHCP (FT)
2D Object DetectionCrowdHuman (full body)AP92.5UniHCP (finetune)
2D Object DetectionCrowdHuman (full body)mMR41.6UniHCP (finetune)
1 Image, 2*2 StitchiMS-COCOAP76.5UniHCP (finetune)
1 Image, 2*2 StitchiOCHumanTest AP87.4UniHCP (direct eval)
1 Image, 2*2 StitchiAICAP33.6UniHCP (finetune)
1 Image, 2*2 StitchiMPII Human PosePCKh-0.593.2UniHCP (FT)
16kCrowdHuman (full body)AP92.5UniHCP (finetune)
16kCrowdHuman (full body)mMR41.6UniHCP (finetune)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17