P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

Wenkang Shan, Zhenhua Liu, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Wen Gao

2022-03-15Denoising 3D Human Pose Estimation Monocular 3D Human Pose Estimation Pose Estimation

Abstract

This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task. To reduce the difficulty of capturing spatial and temporal information, we divide this task into two stages: pre-training (Stage I) and fine-tuning (Stage II). In Stage I, a self-supervised pre-training sub-task, termed masked pose modeling, is proposed. The human joints in the input sequence are randomly masked in both spatial and temporal domains. A general form of denoising auto-encoder is exploited to recover the original 2D poses and the encoder is capable of capturing spatial and temporal dependencies in this way. In Stage II, the pre-trained encoder is loaded to STMO model and fine-tuned. The encoder is followed by a many-to-one frame aggregator to predict the 3D pose in the current frame. Especially, an MLP block is utilized as the spatial feature extractor in STMO, which yields better performance than other methods. In addition, a temporal downsampling strategy is proposed to diminish data redundancy. Extensive experiments on two benchmarks show that our method outperforms state-of-the-art methods with fewer parameters and less computational overhead. For example, our P-STMO model achieves 42.1mm MPJPE on Human3.6M dataset when using 2D poses from CPN as inputs. Meanwhile, it brings a 1.5-7.1 times speedup to state-of-the-art methods. Code is available at https://github.com/paTRICK-swk/P-STMO.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	MPI-INF-3DHP	AUC	75.8	P-STMO (N=81)
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	32.2	P-STMO (N=81)
3D Human Pose Estimation	MPI-INF-3DHP	PCK	97.9	P-STMO (N=81)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
3D Human Pose Estimation	Human3.6M	PA-MPJPE	34.4	P-STMO (N=243)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	44.1	P-STMO-S (N=81)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
3D Human Pose Estimation	Human3.6M	Frames Needed	243	P-STMO (N=243)
Pose Estimation	MPI-INF-3DHP	AUC	75.8	P-STMO (N=81)
Pose Estimation	MPI-INF-3DHP	MPJPE	32.2	P-STMO (N=81)
Pose Estimation	MPI-INF-3DHP	PCK	97.9	P-STMO (N=81)
Pose Estimation	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
Pose Estimation	Human3.6M	PA-MPJPE	34.4	P-STMO (N=243)
Pose Estimation	Human3.6M	Average MPJPE (mm)	44.1	P-STMO-S (N=81)
Pose Estimation	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
Pose Estimation	Human3.6M	Frames Needed	243	P-STMO (N=243)
3D	MPI-INF-3DHP	AUC	75.8	P-STMO (N=81)
3D	MPI-INF-3DHP	MPJPE	32.2	P-STMO (N=81)
3D	MPI-INF-3DHP	PCK	97.9	P-STMO (N=81)
3D	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
3D	Human3.6M	PA-MPJPE	34.4	P-STMO (N=243)
3D	Human3.6M	Average MPJPE (mm)	44.1	P-STMO-S (N=81)
3D	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
3D	Human3.6M	Frames Needed	243	P-STMO (N=243)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	AUC	75.8	P-STMO (N=81)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	32.2	P-STMO (N=81)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	PCK	97.9	P-STMO (N=81)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
1 Image, 2*2 Stitchi	Human3.6M	PA-MPJPE	34.4	P-STMO (N=243)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	44.1	P-STMO-S (N=81)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	42.1	P-STMO (N=243)
1 Image, 2*2 Stitchi	Human3.6M	Frames Needed	243	P-STMO (N=243)

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

Abstract

Results

Related Papers

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

Abstract

Results

Related Papers