TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/X-Pose: Detecting Any Keypoints

X-Pose: Detecting Any Keypoints

Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang

2023-10-122D Human Pose EstimationMulti-Person Pose EstimationKeypoint DetectionContrastive LearningAnimal Pose Estimation2D Pose Estimation
PaperPDFCode(official)Code(official)

Abstract

This work aims to address an advanced keypoint detection problem: how to accurately detect any keypoints in complex real-world scenarios, which involves massive, messy, and open-ended objects as well as their associated keypoints definitions. Current high-performance keypoint detectors often fail to tackle this problem due to their two-stage schemes, under-explored prompt designs, and limited training data. To bridge the gap, we propose X-Pose, a novel end-to-end framework with multi-modal (i.e., visual, textual, or their combinations) prompts to detect multi-object keypoints for any articulated (e.g., human and animal), rigid, and soft objects within a given image. Moreover, we introduce a large-scale dataset called UniKPT, which unifies 13 keypoint detection datasets with 338 keypoints across 1,237 categories over 400K instances. Training with UniKPT, X-Pose effectively aligns text-to-keypoint and image-to-keypoint due to the mutual enhancement of multi-modal prompts based on cross-modality contrastive learning. Our experimental results demonstrate that X-Pose achieves notable improvements of 27.7 AP, 6.44 PCK, and 7.0 AP compared to state-of-the-art non-promptable, visual prompt-based, and textual prompt-based methods in each respective fair setting. More importantly, the in-the-wild test demonstrates X-Pose's strong fine-grained keypoint localization and generalization abilities across image styles, object categories, and poses, paving a new path to multi-object keypoint detection in real applications. Our code and dataset are available at https://github.com/IDEA-Research/X-Pose.

Results

TaskDatasetMetricValueModel
Pose EstimationCOCO (Common Objects in Context)AP0.768UniPose
Pose EstimationAP-10KAP79.2UniPose
2D Pose EstimationVinegar FlyMean PCK@0.299.9UniPose
2D Pose Estimation300WMean PCK@0.299.4UniPose
2D Pose EstimationMacaquePoseAP79.4UniPose
2D Pose EstimationDesert LocustMean PCK@0.299.9UniPose
2D Pose EstimationAnimal KingdomMean PCK@0.296.1UniPose
2D Pose EstimationAnimal KingdomPCK@0.0571.5UniPose
3DCOCO (Common Objects in Context)AP0.768UniPose
3DAP-10KAP79.2UniPose
Animal Pose EstimationAP-10KAP79.2UniPose
2D Human Pose EstimationHuman-ArtAP0.759UniPose
2D ClassificationVinegar FlyMean PCK@0.299.9UniPose
2D Classification300WMean PCK@0.299.4UniPose
2D ClassificationMacaquePoseAP79.4UniPose
2D ClassificationDesert LocustMean PCK@0.299.9UniPose
2D ClassificationAnimal KingdomMean PCK@0.296.1UniPose
2D ClassificationAnimal KingdomPCK@0.0571.5UniPose
Multi-Person Pose EstimationCOCO (Common Objects in Context)AP0.768UniPose
1 Image, 2*2 StitchiCOCO (Common Objects in Context)AP0.768UniPose
1 Image, 2*2 StitchiAP-10KAP79.2UniPose

Related Papers

SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17SEPose: A Synthetic Event-based Human Pose Estimation Dataset for Pedestrian Monitoring2025-07-16Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16KptLLM++: Towards Generic Keypoint Comprehension with Large Language Model2025-07-15GKNet: Graph-based Keypoints Network for Monocular Pose Estimation of Non-cooperative Spacecraft2025-07-15