Soroush Mehraban, Vida Adeli, Babak Taati
Recent transformer-based approaches have demonstrated excellent performance in 3D human pose estimation. However, they have a holistic view and by encoding global relationships between all the joints, they do not capture the local dependencies precisely. In this paper, we present a novel Attention-GCNFormer (AGFormer) block that divides the number of channels by using two parallel transformer and GCNFormer streams. Our proposed GCNFormer module exploits the local relationship between adjacent joints, outputting a new representation that is complementary to the transformer output. By fusing these two representation in an adaptive way, AGFormer exhibits the ability to better learn the underlying 3D structure. By stacking multiple AGFormer blocks, we propose MotionAGFormer in four different variants, which can be chosen based on the speed-accuracy trade-off. We evaluate our model on two popular benchmark datasets: Human3.6M and MPI-INF-3DHP. MotionAGFormer-B achieves state-of-the-art results, with P1 errors of 38.4mm and 16.2mm, respectively. Remarkably, it uses a quarter of the parameters and is three times more computationally efficient than the previous leading model on Human3.6M dataset. Code and models are available at https://github.com/TaatiTeam/MotionAGFormer.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | MPI-INF-3DHP | AUC | 85.3 | MotionAGFormer-L (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 16.2 | MotionAGFormer-L (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-L (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | AUC | 84.5 | MotionAGFormer-S (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 17.1 | MotionAGFormer-S (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-S (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | AUC | 84.2 | MotionAGFormer-B (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 18.2 | MotionAGFormer-B (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-B (T=81) |
| 3D Human Pose Estimation | MPI-INF-3DHP | AUC | 83.5 | MotionAGFormer-XS (T=27) |
| 3D Human Pose Estimation | MPI-INF-3DHP | MPJPE | 19.2 | MotionAGFormer-XS (T=27) |
| 3D Human Pose Estimation | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-XS (T=27) |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-L |
| 3D Human Pose Estimation | Human3.6M | Frames Needed | 243 | MotionAGFormer-L |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-B |
| 3D Human Pose Estimation | Human3.6M | Frames Needed | 243 | MotionAGFormer-B |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 42.5 | MotionAGFormer-S |
| 3D Human Pose Estimation | Human3.6M | Frames Needed | 81 | MotionAGFormer-S |
| 3D Human Pose Estimation | Human3.6M | Average MPJPE (mm) | 45.1 | MotionAGFormer-XS |
| 3D Human Pose Estimation | Human3.6M | Frames Needed | 27 | MotionAGFormer-XS |
| Pose Estimation | MPI-INF-3DHP | AUC | 85.3 | MotionAGFormer-L (T=81) |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 16.2 | MotionAGFormer-L (T=81) |
| Pose Estimation | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-L (T=81) |
| Pose Estimation | MPI-INF-3DHP | AUC | 84.5 | MotionAGFormer-S (T=81) |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 17.1 | MotionAGFormer-S (T=81) |
| Pose Estimation | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-S (T=81) |
| Pose Estimation | MPI-INF-3DHP | AUC | 84.2 | MotionAGFormer-B (T=81) |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 18.2 | MotionAGFormer-B (T=81) |
| Pose Estimation | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-B (T=81) |
| Pose Estimation | MPI-INF-3DHP | AUC | 83.5 | MotionAGFormer-XS (T=27) |
| Pose Estimation | MPI-INF-3DHP | MPJPE | 19.2 | MotionAGFormer-XS (T=27) |
| Pose Estimation | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-XS (T=27) |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-L |
| Pose Estimation | Human3.6M | Frames Needed | 243 | MotionAGFormer-L |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-B |
| Pose Estimation | Human3.6M | Frames Needed | 243 | MotionAGFormer-B |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 42.5 | MotionAGFormer-S |
| Pose Estimation | Human3.6M | Frames Needed | 81 | MotionAGFormer-S |
| Pose Estimation | Human3.6M | Average MPJPE (mm) | 45.1 | MotionAGFormer-XS |
| Pose Estimation | Human3.6M | Frames Needed | 27 | MotionAGFormer-XS |
| 3D | MPI-INF-3DHP | AUC | 85.3 | MotionAGFormer-L (T=81) |
| 3D | MPI-INF-3DHP | MPJPE | 16.2 | MotionAGFormer-L (T=81) |
| 3D | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-L (T=81) |
| 3D | MPI-INF-3DHP | AUC | 84.5 | MotionAGFormer-S (T=81) |
| 3D | MPI-INF-3DHP | MPJPE | 17.1 | MotionAGFormer-S (T=81) |
| 3D | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-S (T=81) |
| 3D | MPI-INF-3DHP | AUC | 84.2 | MotionAGFormer-B (T=81) |
| 3D | MPI-INF-3DHP | MPJPE | 18.2 | MotionAGFormer-B (T=81) |
| 3D | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-B (T=81) |
| 3D | MPI-INF-3DHP | AUC | 83.5 | MotionAGFormer-XS (T=27) |
| 3D | MPI-INF-3DHP | MPJPE | 19.2 | MotionAGFormer-XS (T=27) |
| 3D | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-XS (T=27) |
| 3D | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-L |
| 3D | Human3.6M | Frames Needed | 243 | MotionAGFormer-L |
| 3D | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-B |
| 3D | Human3.6M | Frames Needed | 243 | MotionAGFormer-B |
| 3D | Human3.6M | Average MPJPE (mm) | 42.5 | MotionAGFormer-S |
| 3D | Human3.6M | Frames Needed | 81 | MotionAGFormer-S |
| 3D | Human3.6M | Average MPJPE (mm) | 45.1 | MotionAGFormer-XS |
| 3D | Human3.6M | Frames Needed | 27 | MotionAGFormer-XS |
| Classification | Full-body Parkinson’s disease dataset | F1-score (weighted) | 0.42 | MotionAGFormer |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | AUC | 85.3 | MotionAGFormer-L (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 16.2 | MotionAGFormer-L (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-L (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | AUC | 84.5 | MotionAGFormer-S (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 17.1 | MotionAGFormer-S (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-S (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | AUC | 84.2 | MotionAGFormer-B (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 18.2 | MotionAGFormer-B (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 98.3 | MotionAGFormer-B (T=81) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | AUC | 83.5 | MotionAGFormer-XS (T=27) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | MPJPE | 19.2 | MotionAGFormer-XS (T=27) |
| 1 Image, 2*2 Stitchi | MPI-INF-3DHP | PCK | 98.2 | MotionAGFormer-XS (T=27) |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-L |
| 1 Image, 2*2 Stitchi | Human3.6M | Frames Needed | 243 | MotionAGFormer-L |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 38.4 | MotionAGFormer-B |
| 1 Image, 2*2 Stitchi | Human3.6M | Frames Needed | 243 | MotionAGFormer-B |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 42.5 | MotionAGFormer-S |
| 1 Image, 2*2 Stitchi | Human3.6M | Frames Needed | 81 | MotionAGFormer-S |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) | 45.1 | MotionAGFormer-XS |
| 1 Image, 2*2 Stitchi | Human3.6M | Frames Needed | 27 | MotionAGFormer-XS |