Martin Engelcke, Dushyant Rao, Dominic Zeng Wang, Chi Hay Tong, Ingmar Posner
This paper proposes a computationally efficient approach to detecting objects natively in 3D point clouds using convolutional neural networks (CNNs). In particular, this is achieved by leveraging a feature-centric voting scheme to implement novel convolutional layers which explicitly exploit the sparsity encountered in the input. To this end, we examine the trade-off between accuracy and speed for different architectures and additionally propose to use an L1 penalty on the filter activations to further encourage sparsity in the intermediate representations. To the best of our knowledge, this is the first work to propose sparse convolutional layers and L1 regularisation for efficient large-scale processing of 3D data. We demonstrate the efficacy of our approach on the KITTI object detection benchmark and show that Vote3Deep models with as few as three layers outperform the previous state of the art in both laser and laser-vision based approaches by margins of up to 40% while remaining highly competitive in terms of processing time.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | KITTI Cars Moderate | AP | 68.24 | Vote3Deep |
| Object Detection | KITTI Cyclists Moderate | AP | 67.88 | Vote3Deep |
| Object Detection | KITTI Pedestrians Moderate | AP | 55.37 | Vote3Deep |
| Object Detection | KITTI Cyclists Hard | AP | 62.98 | Vote3Deep |
| Object Detection | KITTI Cars Hard | AP | 63.23 | Vote3Deep |
| Object Detection | KITTI Cyclists Easy | AP | 79.92 | Vote3Deep |
| Object Detection | KITTI Pedestrians Easy | AP | 68.39 | Vote3Deep |
| Object Detection | KITTI Cars Easy | AP | 76.79 | Vote3Deep |
| Object Detection | KITTI Pedestrians Hard | AP | 52.59 | Vote3Deep |
| 3D | KITTI Cars Moderate | AP | 68.24 | Vote3Deep |
| 3D | KITTI Cyclists Moderate | AP | 67.88 | Vote3Deep |
| 3D | KITTI Pedestrians Moderate | AP | 55.37 | Vote3Deep |
| 3D | KITTI Cyclists Hard | AP | 62.98 | Vote3Deep |
| 3D | KITTI Cars Hard | AP | 63.23 | Vote3Deep |
| 3D | KITTI Cyclists Easy | AP | 79.92 | Vote3Deep |
| 3D | KITTI Pedestrians Easy | AP | 68.39 | Vote3Deep |
| 3D | KITTI Cars Easy | AP | 76.79 | Vote3Deep |
| 3D | KITTI Pedestrians Hard | AP | 52.59 | Vote3Deep |
| 2D Classification | KITTI Cars Moderate | AP | 68.24 | Vote3Deep |
| 2D Classification | KITTI Cyclists Moderate | AP | 67.88 | Vote3Deep |
| 2D Classification | KITTI Pedestrians Moderate | AP | 55.37 | Vote3Deep |
| 2D Classification | KITTI Cyclists Hard | AP | 62.98 | Vote3Deep |
| 2D Classification | KITTI Cars Hard | AP | 63.23 | Vote3Deep |
| 2D Classification | KITTI Cyclists Easy | AP | 79.92 | Vote3Deep |
| 2D Classification | KITTI Pedestrians Easy | AP | 68.39 | Vote3Deep |
| 2D Classification | KITTI Cars Easy | AP | 76.79 | Vote3Deep |
| 2D Classification | KITTI Pedestrians Hard | AP | 52.59 | Vote3Deep |
| 2D Object Detection | KITTI Cars Moderate | AP | 68.24 | Vote3Deep |
| 2D Object Detection | KITTI Cyclists Moderate | AP | 67.88 | Vote3Deep |
| 2D Object Detection | KITTI Pedestrians Moderate | AP | 55.37 | Vote3Deep |
| 2D Object Detection | KITTI Cyclists Hard | AP | 62.98 | Vote3Deep |
| 2D Object Detection | KITTI Cars Hard | AP | 63.23 | Vote3Deep |
| 2D Object Detection | KITTI Cyclists Easy | AP | 79.92 | Vote3Deep |
| 2D Object Detection | KITTI Pedestrians Easy | AP | 68.39 | Vote3Deep |
| 2D Object Detection | KITTI Cars Easy | AP | 76.79 | Vote3Deep |
| 2D Object Detection | KITTI Pedestrians Hard | AP | 52.59 | Vote3Deep |
| 16k | KITTI Cars Moderate | AP | 68.24 | Vote3Deep |
| 16k | KITTI Cyclists Moderate | AP | 67.88 | Vote3Deep |
| 16k | KITTI Pedestrians Moderate | AP | 55.37 | Vote3Deep |
| 16k | KITTI Cyclists Hard | AP | 62.98 | Vote3Deep |
| 16k | KITTI Cars Hard | AP | 63.23 | Vote3Deep |
| 16k | KITTI Cyclists Easy | AP | 79.92 | Vote3Deep |
| 16k | KITTI Pedestrians Easy | AP | 68.39 | Vote3Deep |
| 16k | KITTI Cars Easy | AP | 76.79 | Vote3Deep |
| 16k | KITTI Pedestrians Hard | AP | 52.59 | Vote3Deep |