TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/View-Invariant Probabilistic Embedding for Human Pose

View-Invariant Probabilistic Embedding for Human Pose

Jennifer J. Sun, Jiaping Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Ting Liu

2019-12-02ECCV 2020 8Pose RetrievalSkeleton Based Action RecognitionVideo AlignmentAction Recognition
PaperPDFCode(official)Code

Abstract

Depictions of similar human body configurations can vary with changing viewpoints. Using only 2D information, we would like to enable vision algorithms to recognize similarity in human body poses across multiple views. This ability is useful for analyzing body movements and human behaviors in images and videos. In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses. Since 2D poses are projected from 3D space, they have an inherent ambiguity, which is difficult to represent through a deterministic mapping. Hence, we use probabilistic embeddings to model this input uncertainty. Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views, in comparison with 2D-to-3D pose lifting models. We also demonstrate the effectiveness of applying our embeddings to view-invariant action recognition and video alignment. Our code is available at https://github.com/google-research/google-research/tree/master/poem.

Results

TaskDatasetMetricValueModel
Video UnderstandingUPenn ActionKendall's Tau0.7476Pr-VIPE
VideoUPenn ActionKendall's Tau0.7476Pr-VIPE
VideoUPenn ActionAccuracy97.5Pr-VIPE
Temporal Action LocalizationUPenn ActionAccuracy97.5Pr-VIPE
Zero-Shot LearningUPenn ActionAccuracy97.5Pr-VIPE
Activity RecognitionUPenn ActionAccuracy97.5Pr-VIPE
Action LocalizationUPenn ActionAccuracy97.5Pr-VIPE
Action DetectionUPenn ActionAccuracy97.5Pr-VIPE
3D Action RecognitionUPenn ActionAccuracy97.5Pr-VIPE
Action RecognitionUPenn ActionAccuracy97.5Pr-VIPE
Pose RetrievalMPI-INF-3DHPHit@126.4Pr-VIPE
Pose RetrievalMPI-INF-3DHPHit@1058.6Pr-VIPE
Pose RetrievalHuman3.6MHit@176.2Pr-VIPE
Pose RetrievalHuman3.6MHit@1095.6Pr-VIPE

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16