PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

Zhongwei Qiu, Yang Qiansheng, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Chang Xu, Dongmei Fu, Jingdong Wang

2023-03-16CVPR 2023 13D Human Pose Estimation 3D human pose and shape estimation

Abstract

Existing methods of multi-person video 3D human Pose and Shape Estimation (PSE) typically adopt a two-stage strategy, which first detects human instances in each frame and then performs single-person PSE with temporal model. However, the global spatio-temporal context among spatial instances can not be captured. In this paper, we propose a new end-to-end multi-person 3D Pose and Shape estimation framework with progressive Video Transformer, termed PSVT. In PSVT, a spatio-temporal encoder (STE) captures the global feature dependencies among spatial objects. Then, spatio-temporal pose decoder (STPD) and shape decoder (STSD) capture the global dependencies between pose queries and feature tokens, shape queries and feature tokens, respectively. To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame. Besides, we propose a novel pose-guided attention (PGA) for shape decoder to better predict shape parameters. The two components strengthen the decoder of PSVT to improve performance. Extensive experiments on the four datasets show that PSVT achieves stage-of-the-art results.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	3DPW	MPJPE	73.1	PSVT
3D Human Pose Estimation	3DPW	MPVPE	84	PSVT
3D Human Pose Estimation	3DPW	PA-MPJPE	43.5	PSVT
Pose Estimation	3DPW	MPJPE	73.1	PSVT
Pose Estimation	3DPW	MPVPE	84	PSVT
Pose Estimation	3DPW	PA-MPJPE	43.5	PSVT
3D	3DPW	MPJPE	73.1	PSVT
3D	3DPW	MPVPE	84	PSVT
3D	3DPW	PA-MPJPE	43.5	PSVT
1 Image, 2*2 Stitchi	3DPW	MPJPE	73.1	PSVT
1 Image, 2*2 Stitchi	3DPW	MPVPE	84	PSVT
1 Image, 2*2 Stitchi	3DPW	PA-MPJPE	43.5	PSVT

PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

Abstract

Results

Related Papers

PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

Abstract

Results

Related Papers