3D Human Pose Estimation with Spatial and Temporal Transformers

Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding

2021-03-18ICCV 2021 103D Human Pose Estimation Image Classification Monocular 3D Human Pose Estimation Semantic Segmentation Pose Estimation object-detection Object Detection

Paper PDF Code Code(official)Code

Abstract

Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation. However, in the field of human pose estimation, convolutional architectures still remain dominant. In this work, we present PoseFormer, a purely transformer-based approach for 3D human pose estimation in videos without convolutional architectures involved. Inspired by recent developments in vision transformers, we design a spatial-temporal transformer structure to comprehensively model the human joint relations within each frame as well as the temporal correlations across frames, then output an accurate 3D human pose of the center frame. We quantitatively and qualitatively evaluate our method on two popular and standard benchmark datasets: Human3.6M and MPI-INF-3DHP. Extensive experiments show that PoseFormer achieves state-of-the-art performance on both datasets. Code is available at \url{https://github.com/zczcwh/PoseFormer}

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	21.6	PoseFormer
3D Human Pose Estimation	MPI-INF-3DHP	AUC	56.4	PoseFormer (9 frames)
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	77.1	PoseFormer (9 frames)
3D Human Pose Estimation	MPI-INF-3DHP	PCK	88.6	PoseFormer (9 frames)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (f=81)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (T=81)
3D Human Pose Estimation	Human3.6M	Frames Needed	81	PoseFormer (T=81)
Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	21.6	PoseFormer
Pose Estimation	MPI-INF-3DHP	AUC	56.4	PoseFormer (9 frames)
Pose Estimation	MPI-INF-3DHP	MPJPE	77.1	PoseFormer (9 frames)
Pose Estimation	MPI-INF-3DHP	PCK	88.6	PoseFormer (9 frames)
Pose Estimation	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (f=81)
Pose Estimation	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (T=81)
Pose Estimation	Human3.6M	Frames Needed	81	PoseFormer (T=81)
3D	HumanEva-I	Mean Reconstruction Error (mm)	21.6	PoseFormer
3D	MPI-INF-3DHP	AUC	56.4	PoseFormer (9 frames)
3D	MPI-INF-3DHP	MPJPE	77.1	PoseFormer (9 frames)
3D	MPI-INF-3DHP	PCK	88.6	PoseFormer (9 frames)
3D	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (f=81)
3D	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (T=81)
3D	Human3.6M	Frames Needed	81	PoseFormer (T=81)
1 Image, 2*2 Stitchi	HumanEva-I	Mean Reconstruction Error (mm)	21.6	PoseFormer
1 Image, 2*2 Stitchi	MPI-INF-3DHP	AUC	56.4	PoseFormer (9 frames)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	77.1	PoseFormer (9 frames)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	PCK	88.6	PoseFormer (9 frames)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (f=81)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	44.3	PoseFormer (T=81)
1 Image, 2*2 Stitchi	Human3.6M	Frames Needed	81	PoseFormer (T=81)

3D Human Pose Estimation with Spatial and Temporal Transformers

Abstract

Results

Related Papers

3D Human Pose Estimation with Spatial and Temporal Transformers

Abstract

Results

Related Papers