Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

Junhao Zhang, Yali Wang, Zhipeng Zhou, Tianyu Luan, Zhe Wang, Yu Qiao

2021-09-153D Human Pose Estimation Pose Estimation 3D Pose Estimation

Abstract

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	19.5	DG-Net (T=4)
3D Human Pose Estimation	MPI-INF-3DHP	AUC	53.8	DG-Net (T=4)
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE	76	DG-Net (T=4)
Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	19.5	DG-Net (T=4)
Pose Estimation	MPI-INF-3DHP	AUC	53.8	DG-Net (T=4)
Pose Estimation	MPI-INF-3DHP	MPJPE	76	DG-Net (T=4)
3D	HumanEva-I	Mean Reconstruction Error (mm)	19.5	DG-Net (T=4)
3D	MPI-INF-3DHP	AUC	53.8	DG-Net (T=4)
3D	MPI-INF-3DHP	MPJPE	76	DG-Net (T=4)
1 Image, 2*2 Stitchi	HumanEva-I	Mean Reconstruction Error (mm)	19.5	DG-Net (T=4)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	AUC	53.8	DG-Net (T=4)
1 Image, 2*2 Stitchi	MPI-INF-3DHP	MPJPE	76	DG-Net (T=4)

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

Abstract

Results

Related Papers

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

Abstract

Results

Related Papers