Yue Wang, Justin M. Solomon
Point cloud registration is a key problem for computer vision applied to robotics, medical imaging, and other applications. This problem involves finding a rigid transformation from one point cloud into another so that they align. Iterative Closest Point (ICP) and its variants provide simple and easily-implemented iterative methods for this task, but these algorithms can converge to spurious local optima. To address local optima and other difficulties in the ICP pipeline, we propose a learning-based method, titled Deep Closest Point (DCP), inspired by recent techniques in computer vision and natural language processing. Our model consists of three parts: a point cloud embedding network, an attention-based module combined with a pointer generation layer, to approximate combinatorial matching, and a differentiable singular value decomposition (SVD) layer to extract the final rigid transformation. We train our model end-to-end on the ModelNet40 dataset and show in several settings that it performs better than ICP, its variants (e.g., Go-ICP, FGR), and the recently-proposed learning-based method PointNetLK. Beyond providing a state-of-the-art registration technique, we evaluate the suitability of our learned features transferred to unseen objects. We also provide preliminary analysis of our learned model to help understand whether domain-specific and/or global features facilitate rigid registration.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Visual Localization | Oxford Radar RobotCar (Full-6) | Mean Translation Error | 18.45 | DCP |
| Point Cloud Registration | 3DMatch (at least 30% overlapped - FCGF setting) | Recall (0.3m, 15 degrees) | 3.22 | DCP |
| 3D Point Cloud Interpolation | 3DMatch (at least 30% overlapped - FCGF setting) | Recall (0.3m, 15 degrees) | 3.22 | DCP |