Lei Li, Siyu Zhu, Hongbo Fu, Ping Tan, Chiew-Lan Tai
In this work, we propose an end-to-end framework to learn local multi-view descriptors for 3D point clouds. To adopt a similar multi-view representation, existing studies use hand-crafted viewpoints for rendering in a preprocessing stage, which is detached from the subsequent descriptor learning stage. In our framework, we integrate the multi-view rendering into neural networks by using a differentiable renderer, which allows the viewpoints to be optimizable parameters for capturing more informative local context of interest points. To obtain discriminative descriptors, we also design a soft-view pooling module to attentively fuse convolutional features across views. Extensive experiments on existing 3D registration benchmarks show that our method outperforms existing local descriptors both quantitatively and qualitatively.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Point Cloud Registration | 3DMatch Benchmark | Feature Matching Recall | 97.5 | LMVD |
| Point Cloud Registration | ETH (trained on 3DMatch) | Feature Matching Recall | 0.616 | LMVD |
| 3D Point Cloud Interpolation | 3DMatch Benchmark | Feature Matching Recall | 97.5 | LMVD |
| 3D Point Cloud Interpolation | ETH (trained on 3DMatch) | Feature Matching Recall | 0.616 | LMVD |