Yun-Chun Chen, Marco Piccirilli, Robinson Piramuthu, Ming-Hsuan Yang
We consider the task of estimating 3D human pose and shape from videos. While existing frame-based approaches have made significant progress, these methods are independently applied to each image, thereby often leading to inconsistent predictions. In this work, we present a video-based learning algorithm for 3D human pose and shape estimation. The key insights of our method are two-fold. First, to address the inconsistent temporal prediction issue, we exploit temporal information in videos and propose a self-attention module that jointly considers short-range and long-range dependencies across frames, resulting in temporally coherent estimations. Second, we model human motion with a forecasting module that allows the transition between adjacent frames to be smooth. We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets. Extensive experimental results show that our algorithm performs favorably against the state-of-the-art methods.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 94.3 | Self-Attentive |
| 3D Human Pose Estimation | MPI-INF-3DHP | PA-MPJPE | 60.7 | Self-Attentive |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 90.1 | Self-Attentive |
| 3D Human Pose Estimation | 3DPW | Acceleration Error | 77.9 | Self-Attentive |
| 3D Human Pose Estimation | 3DPW | MPJPE | 85.8 | Self-Attentive |
| 3D Human Pose Estimation | 3DPW | MPVPE | 100.6 | Self-Attentive |
| 3D Human Pose Estimation | 3DPW | PA-MPJPE | 50.4 | Self-Attentive |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 94.3 | Self-Attentive |
| Pose Estimation | MPI-INF-3DHP | PA-MPJPE | 60.7 | Self-Attentive |
| Pose Estimation | MPI-INF-3DHP | PCK | 90.1 | Self-Attentive |
| Pose Estimation | 3DPW | Acceleration Error | 77.9 | Self-Attentive |
| Pose Estimation | 3DPW | MPJPE | 85.8 | Self-Attentive |
| Pose Estimation | 3DPW | MPVPE | 100.6 | Self-Attentive |
| Pose Estimation | 3DPW | PA-MPJPE | 50.4 | Self-Attentive |
| 3D | MPI-INF-3DHP | MPJPE | 94.3 | Self-Attentive |
| 3D | MPI-INF-3DHP | PA-MPJPE | 60.7 | Self-Attentive |
| 3D | MPI-INF-3DHP | PCK | 90.1 | Self-Attentive |
| 3D | 3DPW | Acceleration Error | 77.9 | Self-Attentive |
| 3D | 3DPW | MPJPE | 85.8 | Self-Attentive |
| 3D | 3DPW | MPVPE | 100.6 | Self-Attentive |
| 3D | 3DPW | PA-MPJPE | 50.4 | Self-Attentive |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 94.3 | Self-Attentive |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PA-MPJPE | 60.7 | Self-Attentive |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 90.1 | Self-Attentive |
| 1 Image, 2*2 Stitchi | 3DPW | Acceleration Error | 77.9 | Self-Attentive |
| 1 Image, 2*2 Stitchi | 3DPW | MPJPE | 85.8 | Self-Attentive |
| 1 Image, 2*2 Stitchi | 3DPW | MPVPE | 100.6 | Self-Attentive |
| 1 Image, 2*2 Stitchi | 3DPW | PA-MPJPE | 50.4 | Self-Attentive |