Shanyan Guan, Jingwei Xu, Yunbo Wang, Bingbing Ni, Xiaokang Yang
This paper considers a new problem of adapting a pre-trained model of human mesh reconstruction to out-of-domain streaming videos. However, most previous methods based on the parametric SMPL model \cite{loper2015smpl} underperform in new domains with unexpected, domain-specific attributes, such as camera parameters, lengths of bones, backgrounds, and occlusions. Our general idea is to dynamically fine-tune the source model on test video streams with additional temporal constraints, such that it can mitigate the domain gaps without over-fitting the 2D information of individual test frames. A subsequent challenge is how to avoid conflicts between the 2D and temporal constraints. We propose to tackle this problem using a new training algorithm named Bilevel Online Adaptation (BOA), which divides the optimization process of overall multi-objective into two steps of weight probe and weight update in a training iteration. We demonstrate that BOA leads to state-of-the-art results on two human mesh reconstruction benchmarks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| 3D Human Pose Estimation | 3DPW | MPJPE | 77.2 | BOA (w/ 2D GT) |
| 3D Human Pose Estimation | 3DPW | MPVPE | 91.2 | BOA (w/ 2D GT) |
| 3D Human Pose Estimation | 3DPW | PA-MPJPE | 49.5 | BOA (w/ 2D GT) |
| Pose Estimation | 3DPW | MPJPE | 77.2 | BOA (w/ 2D GT) |
| Pose Estimation | 3DPW | MPVPE | 91.2 | BOA (w/ 2D GT) |
| Pose Estimation | 3DPW | PA-MPJPE | 49.5 | BOA (w/ 2D GT) |
| 3D | 3DPW | MPJPE | 77.2 | BOA (w/ 2D GT) |
| 3D | 3DPW | MPVPE | 91.2 | BOA (w/ 2D GT) |
| 3D | 3DPW | PA-MPJPE | 49.5 | BOA (w/ 2D GT) |
| 1 Image, 2*2 Stitchi | 3DPW | MPJPE | 77.2 | BOA (w/ 2D GT) |
| 1 Image, 2*2 Stitchi | 3DPW | MPVPE | 91.2 | BOA (w/ 2D GT) |
| 1 Image, 2*2 Stitchi | 3DPW | PA-MPJPE | 49.5 | BOA (w/ 2D GT) |