Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video

Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, Hong-Yuan Mark Liao

2022-03-16CVPR 2022 13D Human Pose Estimation 3D human pose and shape estimation

Abstract

Learning to capture human motion is essential to 3D human pose and shape estimation from monocular video. However, the existing methods mainly rely on recurrent or convolutional operation to model such temporal information, which limits the ability to capture non-local context relations of human motion. To address this problem, we propose a motion pose and shape network (MPS-Net) to effectively capture humans in motion to estimate accurate and temporally coherent 3D human pose and shape from a video. Specifically, we first propose a motion continuity attention (MoCA) module that leverages visual cues observed from human motion to adaptively recalibrate the range that needs attention in the sequence to better capture the motion continuity dependencies. Then, we develop a hierarchical attentive feature integration (HAFI) module to effectively combine adjacent past and future feature representations to strengthen temporal correlation and refine the feature representation of the current frame. By coupling the MoCA and HAFI modules, the proposed MPS-Net excels in estimating 3D human pose and shape in the video. Though conceptually simple, our MPS-Net not only outperforms the state-of-the-art methods on the 3DPW, MPI-INF-3DHP, and Human3.6M benchmark datasets, but also uses fewer network parameters. The video demos can be found at https://mps-net.github.io/MPS-Net/.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
3D Human Pose Estimation	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	MPJPE	84.3	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	MPVPE	99.7	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)
Pose Estimation	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
Pose Estimation	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
Pose Estimation	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
Pose Estimation	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
Pose Estimation	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
Pose Estimation	3DPW	MPJPE	84.3	MPS-Net (T=16)
Pose Estimation	3DPW	MPVPE	99.7	MPS-Net (T=16)
Pose Estimation	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
Pose Estimation	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)
3D	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
3D	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
3D	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
3D	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
3D	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
3D	3DPW	MPJPE	84.3	MPS-Net (T=16)
3D	3DPW	MPVPE	99.7	MPS-Net (T=16)
3D	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
3D	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	MPJPE	84.3	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	MPVPE	99.7	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)

Abstract

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
3D Human Pose Estimation	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	MPJPE	84.3	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	MPVPE	99.7	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
3D Human Pose Estimation	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)
Pose Estimation	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
Pose Estimation	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
Pose Estimation	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
Pose Estimation	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
Pose Estimation	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
Pose Estimation	3DPW	MPJPE	84.3	MPS-Net (T=16)
Pose Estimation	3DPW	MPVPE	99.7	MPS-Net (T=16)
Pose Estimation	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
Pose Estimation	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)
3D	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
3D	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
3D	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
3D	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
3D	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
3D	3DPW	MPJPE	84.3	MPS-Net (T=16)
3D	3DPW	MPVPE	99.7	MPS-Net (T=16)
3D	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
3D	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	Acceleration Error	9.6	MPS-Net (T=16)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	96.7	MPS-Net (T=16)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	PA-MPJPE	62.8	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	Acceleration Error	7.4	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	FLOPs (G)	4.45	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	MPJPE	84.3	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	MPVPE	99.7	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	Number of parameters (M)	39.63	MPS-Net (T=16)
1 Image, 2*2 Stitchi	3DPW	PA-MPJPE	52.1	MPS-Net (T=16)

Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video

Abstract

Results

Related Papers

Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video

Abstract

Results

Related Papers