TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PSVT: End-to-End Multi-person 3D Pose and Shape Estimation...

PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

Zhongwei Qiu, Yang Qiansheng, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Chang Xu, Dongmei Fu, Jingdong Wang

2023-03-16CVPR 2023 13D Human Pose Estimation3D human pose and shape estimation
PaperPDF

Abstract

Existing methods of multi-person video 3D human Pose and Shape Estimation (PSE) typically adopt a two-stage strategy, which first detects human instances in each frame and then performs single-person PSE with temporal model. However, the global spatio-temporal context among spatial instances can not be captured. In this paper, we propose a new end-to-end multi-person 3D Pose and Shape estimation framework with progressive Video Transformer, termed PSVT. In PSVT, a spatio-temporal encoder (STE) captures the global feature dependencies among spatial objects. Then, spatio-temporal pose decoder (STPD) and shape decoder (STSD) capture the global dependencies between pose queries and feature tokens, shape queries and feature tokens, respectively. To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame. Besides, we propose a novel pose-guided attention (PGA) for shape decoder to better predict shape parameters. The two components strengthen the decoder of PSVT to improve performance. Extensive experiments on the four datasets show that PSVT achieves stage-of-the-art results.

Results

TaskDatasetMetricValueModel
3D Human Pose Estimation3DPWMPJPE73.1PSVT
3D Human Pose Estimation3DPWMPVPE84PSVT
3D Human Pose Estimation3DPWPA-MPJPE43.5PSVT
Pose Estimation3DPWMPJPE73.1PSVT
Pose Estimation3DPWMPVPE84PSVT
Pose Estimation3DPWPA-MPJPE43.5PSVT
3D3DPWMPJPE73.1PSVT
3D3DPWMPVPE84PSVT
3D3DPWPA-MPJPE43.5PSVT
1 Image, 2*2 Stitchi3DPWMPJPE73.1PSVT
1 Image, 2*2 Stitchi3DPWMPVPE84PSVT
1 Image, 2*2 Stitchi3DPWPA-MPJPE43.5PSVT

Related Papers

Systematic Comparison of Projection Methods for Monocular 3D Human Pose Estimation on Fisheye Images2025-06-24ExtPose: Robust and Coherent Pose Estimation by Extending ViTs2025-06-18PoseGRAF: Geometric-Reinforced Adaptive Fusion for Monocular 3D Human Pose Estimation2025-06-17Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation2025-06-03UPTor: Unified 3D Human Pose Dynamics and Trajectory Prediction for Human-Robot Interaction2025-05-20PoseBench3D: A Cross-Dataset Analysis Framework for 3D Human Pose Estimation2025-05-16HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation2025-05-07Continuous Normalizing Flows for Uncertainty-Aware Human Pose Estimation2025-05-04