Yan Xu, Kwan-Yee Lin, Guofeng Zhang, Xiaogang Wang, Hongsheng Li
6-DoF object pose estimation from a monocular image is challenging, and a post-refinement procedure is generally needed for high-precision estimation. In this paper, we propose a framework based on a recurrent neural network (RNN) for object pose refinement, which is robust to erroneous initial poses and occlusions. During the recurrent iterations, object pose refinement is formulated as a non-linear least squares problem based on the estimated correspondence field (between a rendered image and the observed image). The problem is then solved by a differentiable Levenberg-Marquardt (LM) algorithm enabling end-to-end training. The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover the object poses. Furthermore, to improve the robustness to occlusion, we introduce a consistency-check mechanism based on the learned descriptors of the 3D model and observed 2D images, which downweights the unreliable correspondences during pose optimization. Extensive experiments on LINEMOD, Occlusion-LINEMOD, and YCB-Video datasets validate the effectiveness of our method and demonstrate state-of-the-art performance.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | LineMOD | Mean ADD | 97.37 | RNNPose |
| Pose Estimation | Occlusion LineMOD | Mean ADD | 60.65 | RNNPose (Trained with synthetic data and LINEMOD training set, w/o pbr data) |
| 3D | LineMOD | Mean ADD | 97.37 | RNNPose |
| 3D | Occlusion LineMOD | Mean ADD | 60.65 | RNNPose (Trained with synthetic data and LINEMOD training set, w/o pbr data) |
| 1 Image, 2*2 Stitchi | LineMOD | Mean ADD | 97.37 | RNNPose |
| 1 Image, 2*2 Stitchi | Occlusion LineMOD | Mean ADD | 60.65 | RNNPose (Trained with synthetic data and LINEMOD training set, w/o pbr data) |