Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions

Ruixu Liu, Ju Shen, He Wang, Chen Chen, Sen-ching Cheung, Vijayan K. Asari

2021-03-043D Human Pose Estimation Pose Estimation 2D Pose Estimation

Abstract

The attention mechanism provides a sequential prediction framework for learning spatial models with enhanced implicit temporal consistency. In this work, we show a systematic design (from 2D to 3D) for how conventional networks and other forms of constraints can be incorporated into the attention framework for learning long-range dependencies for the task of pose estimation. The contribution of this paper is to provide a systematic approach for designing and training of attention-based models for the end-to-end pose estimation, with the flexibility and scalability of arbitrary video sequences as input. We achieve this by adapting temporal receptive field via a multi-scale structure of dilated convolutions. Besides, the proposed architecture can be easily adapted to a causal model enabling real-time performance. Any off-the-shelf 2D pose estimation systems, e.g. Mocap libraries, can be easily integrated in an ad-hoc fashion. Our method achieves the state-of-the-art performance and outperforms existing methods by reducing the mean per joint position error to 33.4 mm on Human3.6M dataset.

Results

Task	Dataset	Metric	Value	Model
3D Human Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	15.4	Attention (T=27 MA)
3D Human Pose Estimation	Human3.6M	Average MPJPE (mm)	44.8	Attention (T=243 CPN)
Pose Estimation	HumanEva-I	Mean Reconstruction Error (mm)	15.4	Attention (T=27 MA)
Pose Estimation	Human3.6M	Average MPJPE (mm)	44.8	Attention (T=243 CPN)
3D	HumanEva-I	Mean Reconstruction Error (mm)	15.4	Attention (T=27 MA)
3D	Human3.6M	Average MPJPE (mm)	44.8	Attention (T=243 CPN)
1 Image, 2*2 Stitchi	HumanEva-I	Mean Reconstruction Error (mm)	15.4	Attention (T=27 MA)
1 Image, 2*2 Stitchi	Human3.6M	Average MPJPE (mm)	44.8	Attention (T=243 CPN)

Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions

Abstract

Results

Related Papers

Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions

Abstract

Results

Related Papers