Jun Wang, Shiyi Lan, Mingfei Gao, Larry S. Davis
Real-time 3D object detection is crucial for autonomous cars. Achieving promising performance with high efficiency, voxel-based approaches have received considerable attention. However, previous methods model the input space with features extracted from equally divided sub-regions without considering that point cloud is generally non-uniformly distributed over the space. To address this issue, we propose a novel 3D object detection framework with dynamic information modeling. The proposed framework is designed in a coarse-to-fine manner. Coarse predictions are generated in the first stage via a voxel-based region proposal network. We introduce InfoFocus, which improves the coarse detections by adaptively refining features guided by the information of point cloud density. Experiments are conducted on the large-scale nuScenes 3D detection benchmark. Results show that our framework achieves the state-of-the-art performance with 31 FPS and improves our baseline significantly by 9.0% mAP on the nuScenes test set.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | nuScenes | NDS | 0.4 | InfoFocus |
| Object Detection | nuScenes | mAAE | 0.4 | InfoFocus |
| Object Detection | nuScenes | mAOE | 1.13 | InfoFocus |
| Object Detection | nuScenes | mAP | 0.39 | InfoFocus |
| Object Detection | nuScenes | mASE | 0.26 | InfoFocus |
| Object Detection | nuScenes | mATE | 0.36 | InfoFocus |
| Object Detection | nuScenes | mAVE | 1 | InfoFocus |
| 3D | nuScenes | NDS | 0.4 | InfoFocus |
| 3D | nuScenes | mAAE | 0.4 | InfoFocus |
| 3D | nuScenes | mAOE | 1.13 | InfoFocus |
| 3D | nuScenes | mAP | 0.39 | InfoFocus |
| 3D | nuScenes | mASE | 0.26 | InfoFocus |
| 3D | nuScenes | mATE | 0.36 | InfoFocus |
| 3D | nuScenes | mAVE | 1 | InfoFocus |
| 3D Object Detection | nuScenes | NDS | 0.4 | InfoFocus |
| 3D Object Detection | nuScenes | mAAE | 0.4 | InfoFocus |
| 3D Object Detection | nuScenes | mAOE | 1.13 | InfoFocus |
| 3D Object Detection | nuScenes | mAP | 0.39 | InfoFocus |
| 3D Object Detection | nuScenes | mASE | 0.26 | InfoFocus |
| 3D Object Detection | nuScenes | mATE | 0.36 | InfoFocus |
| 3D Object Detection | nuScenes | mAVE | 1 | InfoFocus |
| 2D Classification | nuScenes | NDS | 0.4 | InfoFocus |
| 2D Classification | nuScenes | mAAE | 0.4 | InfoFocus |
| 2D Classification | nuScenes | mAOE | 1.13 | InfoFocus |
| 2D Classification | nuScenes | mAP | 0.39 | InfoFocus |
| 2D Classification | nuScenes | mASE | 0.26 | InfoFocus |
| 2D Classification | nuScenes | mATE | 0.36 | InfoFocus |
| 2D Classification | nuScenes | mAVE | 1 | InfoFocus |
| 2D Object Detection | nuScenes | NDS | 0.4 | InfoFocus |
| 2D Object Detection | nuScenes | mAAE | 0.4 | InfoFocus |
| 2D Object Detection | nuScenes | mAOE | 1.13 | InfoFocus |
| 2D Object Detection | nuScenes | mAP | 0.39 | InfoFocus |
| 2D Object Detection | nuScenes | mASE | 0.26 | InfoFocus |
| 2D Object Detection | nuScenes | mATE | 0.36 | InfoFocus |
| 2D Object Detection | nuScenes | mAVE | 1 | InfoFocus |
| 16k | nuScenes | NDS | 0.4 | InfoFocus |
| 16k | nuScenes | mAAE | 0.4 | InfoFocus |
| 16k | nuScenes | mAOE | 1.13 | InfoFocus |
| 16k | nuScenes | mAP | 0.39 | InfoFocus |
| 16k | nuScenes | mASE | 0.26 | InfoFocus |
| 16k | nuScenes | mATE | 0.36 | InfoFocus |
| 16k | nuScenes | mAVE | 1 | InfoFocus |