Apratim Bhattacharyya, Bernt Schiele, Mario Fritz
For autonomous agents to successfully operate in the real world, anticipation of future events and states of their environment is a key competence. This problem has been formalized as a sequence extrapolation problem, where a number of observations are used to predict the sequence into the future. Real-world scenarios demand a model of uncertainty of such predictions, as predictions become increasingly uncertain -- in particular on long time horizons. While impressive results have been shown on point estimates, scenarios that induce multi-modal distributions over future sequences remain challenging. Our work addresses these challenges in a Gaussian Latent Variable model for sequence prediction. Our core contribution is a "Best of Many" sample objective that leads to more accurate and more diverse predictions that better capture the true variations in real-world sequence data. Beyond our analysis of improved model fit, our models also empirically outperform prior work on three diverse tasks ranging from traffic scenes to weather data.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | Human3.6M | ADE | 448 | BoM |
| Pose Estimation | Human3.6M | APD | 6265 | BoM |
| Pose Estimation | Human3.6M | FDE | 533 | BoM |
| Pose Estimation | Human3.6M | MMADE | 514 | BoM |
| Pose Estimation | Human3.6M | MMFDE | 544 | BoM |
| Pose Estimation | HumanEva-I | ADE@2000ms | 271 | BoM |
| Pose Estimation | HumanEva-I | APD@2000ms | 2846 | BoM |
| Pose Estimation | HumanEva-I | FDE@2000ms | 279 | BoM |
| 3D | Human3.6M | ADE | 448 | BoM |
| 3D | Human3.6M | APD | 6265 | BoM |
| 3D | Human3.6M | FDE | 533 | BoM |
| 3D | Human3.6M | MMADE | 514 | BoM |
| 3D | Human3.6M | MMFDE | 544 | BoM |
| 3D | HumanEva-I | ADE@2000ms | 271 | BoM |
| 3D | HumanEva-I | APD@2000ms | 2846 | BoM |
| 3D | HumanEva-I | FDE@2000ms | 279 | BoM |
| 1 Image, 2*2 Stitchi | Human3.6M | ADE | 448 | BoM |
| 1 Image, 2*2 Stitchi | Human3.6M | APD | 6265 | BoM |
| 1 Image, 2*2 Stitchi | Human3.6M | FDE | 533 | BoM |
| 1 Image, 2*2 Stitchi | Human3.6M | MMADE | 514 | BoM |
| 1 Image, 2*2 Stitchi | Human3.6M | MMFDE | 544 | BoM |
| 1 Image, 2*2 Stitchi | HumanEva-I | ADE@2000ms | 271 | BoM |
| 1 Image, 2*2 Stitchi | HumanEva-I | APD@2000ms | 2846 | BoM |
| 1 Image, 2*2 Stitchi | HumanEva-I | FDE@2000ms | 279 | BoM |