Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

2023-03-26CVPR 2023 13D Human Pose Estimation 3D human pose and shape estimation

Abstract

Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness. Although these two metrics are responsible for different ranges of temporal consistency, existing state-of-the-art methods treat them as a unified problem and use monotonous modeling structures (e.g., RNN or attention-based block) to design their networks. However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details. To solve these problems, we propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT). First, a global transformer is introduced with a Masked Pose and Shape Estimation strategy for long-term modeling. The strategy stimulates the global transformer to learn more inter-frame correlations by randomly masking the features of several frames. Second, a local transformer is responsible for exploiting local details on the human mesh and interacting with the global transformer by leveraging cross-attention. Moreover, a Hierarchical Spatial Correlation Regressor is further introduced to refine intra-frame estimations by decoupled global-local representation and implicit kinematic constraints. Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M. Codes are available at https://github.com/sxl142/GLoT.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	MPI-INF-3DHP	Acceleration Error	7.9	GLoT
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	93.9	GLoT
3D Human Pose Estimation	MPI-INF-3DHP	PA-MPJPE	61.5	GLoT
3D Human Pose Estimation	3DPW	Acceleration Error	6.6	GLoT
3D Human Pose Estimation	3DPW	MPJPE	80.7	GLoT
3D Human Pose Estimation	3DPW	MPVPE	96.3	GLoT
3D Human Pose Estimation	3DPW	PA-MPJPE	50.6	GLoT
Pose Estimation	MPI-INF-3DHP	Acceleration Error	7.9	GLoT
Pose Estimation	MPI-INF-3DHP	MPJPE	93.9	GLoT
Pose Estimation	MPI-INF-3DHP	PA-MPJPE	61.5	GLoT
Pose Estimation	3DPW	Acceleration Error	6.6	GLoT
Pose Estimation	3DPW	MPJPE	80.7	GLoT
Pose Estimation	3DPW	MPVPE	96.3	GLoT
Pose Estimation	3DPW	PA-MPJPE	50.6	GLoT
3D	MPI-INF-3DHP	Acceleration Error	7.9	GLoT
3D	MPI-INF-3DHP	MPJPE	93.9	GLoT
3D	MPI-INF-3DHP	PA-MPJPE	61.5	GLoT
3D	3DPW	Acceleration Error	6.6	GLoT
3D	3DPW	MPJPE	80.7	GLoT
3D	3DPW	MPVPE	96.3	GLoT
3D	3DPW	PA-MPJPE	50.6	GLoT
1 Image, 2*2 Stitchi	MPI-INF-3DHP	Acceleration Error	7.9	GLoT
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	93.9	GLoT
1 Image, 2*2 Stitchi	MPI-INF-3DHP	PA-MPJPE	61.5	GLoT
1 Image, 2*2 Stitchi	3DPW	Acceleration Error	6.6	GLoT
1 Image, 2*2 Stitchi	3DPW	MPJPE	80.7	GLoT
1 Image, 2*2 Stitchi	3DPW	MPVPE	96.3	GLoT
1 Image, 2*2 Stitchi	3DPW	PA-MPJPE	50.6	GLoT

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Abstract

Results

Related Papers

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Abstract

Results

Related Papers