Jingbo Wang, Sijie Yan, Yuanjun Xiong, Dahua Lin
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose. In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced. We design a new graph convolutional network architecture, U-shaped GCN (UGCN). It captures both short-term and long-term motion information to fully leverage the additional supervision from the motion loss. We experiment training UGCN with the motion loss on two large scale benchmarks: Human3.6M and MPI-INF-3DHP. Our model surpasses other state-of-the-art models by a large margin. It also demonstrates strong capacity in producing smooth 3D sequences and recovering keypoint motion.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | MPI-INF-3DHP | AUC | 62.1 | UGCN |
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 68.1 | UGCN |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 86.9 | UGCN |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 42.6 | UGCN (HR-Net) |
| Pose Estimation | MPI-INF-3DHP | AUC | 62.1 | UGCN |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 68.1 | UGCN |
| Pose Estimation | MPI-INF-3DHP | PCK | 86.9 | UGCN |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 42.6 | UGCN (HR-Net) |
| 3D | MPI-INF-3DHP | AUC | 62.1 | UGCN |
| 3D | MPI-INF-3DHP | MPJPE | 68.1 | UGCN |
| 3D | MPI-INF-3DHP | PCK | 86.9 | UGCN |
| 3D | Human3.6M | Average MPJPE (mm) | 42.6 | UGCN (HR-Net) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | AUC | 62.1 | UGCN |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 68.1 | UGCN |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 86.9 | UGCN |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 42.6 | UGCN (HR-Net) |