Wen Guo, Yuming Du, Xi Shen, Vincent Lepetit, Xavier Alameda-Pineda, Francesc Moreno-Noguer
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art approaches provide good results, however, they rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks(RNN), Transformers or Graph Convolutional Networks(GCN), typically requiring multiple training stages and more than 2 million parameters. In this paper, we show that, after combining with a series of standard practices, such as applying Discrete Cosine Transform(DCT), predicting residual displacement of joints and optimizing velocity as an auxiliary loss, a light-weight network based on multi-layer perceptrons(MLPs) with only 0.14 million parameters can surpass the state-of-the-art performance. An exhaustive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that our method, named siMLPe, consistently outperforms all other approaches. We hope that our simple method could serve as a strong baseline for the community and allow re-thinking of the human motion prediction problem. The code is publicly available at \url{https://github.com/dulucas/siMLPe}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Autonomous Vehicles | Expi - common actions split | Average MPJPE (mm) @ 1000 ms | 250 | siMLPe |
| Autonomous Vehicles | Expi - common actions split | Average MPJPE (mm) @ 400 ms | 128 | siMLPe |
| Autonomous Vehicles | Expi - common actions split | Average MPJPE (mm) @ 600 ms | 178 | siMLPe |
| Autonomous Vehicles | Expi - unseen actions split | Average MPJPE (mm) @ 400 ms | 131 | siMLPe |
| Autonomous Vehicles | Expi - unseen actions split | Average MPJPE (mm) @ 600 ms | 183 | siMLPe |
| Autonomous Vehicles | Expi - unseen actions split | Average MPJPE (mm) @ 800 ms | 225 | siMLPe |
| Pose Estimation | HARPER | Average MPJPE (mm) @ 1000ms | 141 | SiMLPe |
| Pose Estimation | HARPER | Average MPJPE (mm) @ 400ms | 60 | SiMLPe |
| Pose Estimation | HARPER | Last Frame MPJPE (mm) @ 1000ms | 264 | SiMLPe |
| Pose Estimation | HARPER | Last Frame MPJPE (mm) @ 400ms | 98 | SiMLPe |
| Pose Estimation | AMASS | Average MPJPE (mm) 1000 msec | 65.7 | siMLPe |
| Pose Estimation | Human3.6M | Average MPJPE (mm) @ 1000 ms | 109.4 | siMLPe |
| Pose Estimation | Human3.6M | Average MPJPE (mm) @ 400ms | 57.3 | siMLPe |
| Pose Estimation | 3DPW | Average MPJPE (mm) 1000 msec | 72.2 | siMLPe |
| Pose Estimation | Expi - common actions split | Average MPJPE (mm) @ 200 ms | 80 | siMLPe |
| Motion Forecasting | Expi - common actions split | Average MPJPE (mm) @ 1000 ms | 250 | siMLPe |
| Motion Forecasting | Expi - common actions split | Average MPJPE (mm) @ 400 ms | 128 | siMLPe |
| Motion Forecasting | Expi - common actions split | Average MPJPE (mm) @ 600 ms | 178 | siMLPe |
| Motion Forecasting | Expi - unseen actions split | Average MPJPE (mm) @ 400 ms | 131 | siMLPe |
| Motion Forecasting | Expi - unseen actions split | Average MPJPE (mm) @ 600 ms | 183 | siMLPe |
| Motion Forecasting | Expi - unseen actions split | Average MPJPE (mm) @ 800 ms | 225 | siMLPe |
| 3D | HARPER | Average MPJPE (mm) @ 1000ms | 141 | SiMLPe |
| 3D | HARPER | Average MPJPE (mm) @ 400ms | 60 | SiMLPe |
| 3D | HARPER | Last Frame MPJPE (mm) @ 1000ms | 264 | SiMLPe |
| 3D | HARPER | Last Frame MPJPE (mm) @ 400ms | 98 | SiMLPe |
| 3D | AMASS | Average MPJPE (mm) 1000 msec | 65.7 | siMLPe |
| 3D | Human3.6M | Average MPJPE (mm) @ 1000 ms | 109.4 | siMLPe |
| 3D | Human3.6M | Average MPJPE (mm) @ 400ms | 57.3 | siMLPe |
| 3D | 3DPW | Average MPJPE (mm) 1000 msec | 72.2 | siMLPe |
| 3D | Expi - common actions split | Average MPJPE (mm) @ 200 ms | 80 | siMLPe |
| Autonomous Driving | Expi - common actions split | Average MPJPE (mm) @ 1000 ms | 250 | siMLPe |
| Autonomous Driving | Expi - common actions split | Average MPJPE (mm) @ 400 ms | 128 | siMLPe |
| Autonomous Driving | Expi - common actions split | Average MPJPE (mm) @ 600 ms | 178 | siMLPe |
| Autonomous Driving | Expi - unseen actions split | Average MPJPE (mm) @ 400 ms | 131 | siMLPe |
| Autonomous Driving | Expi - unseen actions split | Average MPJPE (mm) @ 600 ms | 183 | siMLPe |
| Autonomous Driving | Expi - unseen actions split | Average MPJPE (mm) @ 800 ms | 225 | siMLPe |
| 1 Image, 2*2 Stitchi | HARPER | Average MPJPE (mm) @ 1000ms | 141 | SiMLPe |
| 1 Image, 2*2 Stitchi | HARPER | Average MPJPE (mm) @ 400ms | 60 | SiMLPe |
| 1 Image, 2*2 Stitchi | HARPER | Last Frame MPJPE (mm) @ 1000ms | 264 | SiMLPe |
| 1 Image, 2*2 Stitchi | HARPER | Last Frame MPJPE (mm) @ 400ms | 98 | SiMLPe |
| 1 Image, 2*2 Stitchi | AMASS | Average MPJPE (mm) 1000 msec | 65.7 | siMLPe |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) @ 1000 ms | 109.4 | siMLPe |
| 1 Image, 2*2 Stitchi | Human3.6M | Average MPJPE (mm) @ 400ms | 57.3 | siMLPe |
| 1 Image, 2*2 Stitchi | 3DPW | Average MPJPE (mm) 1000 msec | 72.2 | siMLPe |
| 1 Image, 2*2 Stitchi | Expi - common actions split | Average MPJPE (mm) @ 200 ms | 80 | siMLPe |