Haotian Hu, Fanyi Wang, Jingwen Su, Yaonong Wang, Laifeng Hu, Weiye Fang, Jingwei Xu, Zhiwang Zhang
In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method. However, inaccurate depth estimation remains an important constraint to the accuracy of camera-only and multi-model 3D object detection models, especially in regions where the depth changes significantly (i.e., the "depth jump" problem). In this paper, we proposed a novel Edge-aware Lift-splat-shot (EA-LSS) framework. Specifically, edge-aware depth fusion (EADF) module is proposed to alleviate the "depth jump" problem and fine-grained depth (FGD) module to further enforce refined supervision on depth. Our EA-LSS framework is compatible for any LSS-based 3D object detection models, and effectively boosts their performances with negligible increment of inference time. Experiments on nuScenes benchmarks demonstrate that EA-LSS is effective in either camera-only or multi-model models. It is worth mentioning that EA-LSS achieved the state-of-the-art performance on nuScenes test benchmarks with mAP and NDS of 76.5% and 77.6%, respectively.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | nuScenes | NDS | 0.78 | EA-LSS |
| Object Detection | nuScenes | mAAE | 0.12 | EA-LSS |
| Object Detection | nuScenes | mAOE | 0.28 | EA-LSS |
| Object Detection | nuScenes | mAP | 0.77 | EA-LSS |
| Object Detection | nuScenes | mASE | 0.21 | EA-LSS |
| Object Detection | nuScenes | mATE | 0.23 | EA-LSS |
| Object Detection | nuScenes | mAVE | 0.2 | EA-LSS |
| 3D | nuScenes | NDS | 0.78 | EA-LSS |
| 3D | nuScenes | mAAE | 0.12 | EA-LSS |
| 3D | nuScenes | mAOE | 0.28 | EA-LSS |
| 3D | nuScenes | mAP | 0.77 | EA-LSS |
| 3D | nuScenes | mASE | 0.21 | EA-LSS |
| 3D | nuScenes | mATE | 0.23 | EA-LSS |
| 3D | nuScenes | mAVE | 0.2 | EA-LSS |
| 3D Object Detection | nuScenes | NDS | 0.78 | EA-LSS |
| 3D Object Detection | nuScenes | mAAE | 0.12 | EA-LSS |
| 3D Object Detection | nuScenes | mAOE | 0.28 | EA-LSS |
| 3D Object Detection | nuScenes | mAP | 0.77 | EA-LSS |
| 3D Object Detection | nuScenes | mASE | 0.21 | EA-LSS |
| 3D Object Detection | nuScenes | mATE | 0.23 | EA-LSS |
| 3D Object Detection | nuScenes | mAVE | 0.2 | EA-LSS |
| 2D Classification | nuScenes | NDS | 0.78 | EA-LSS |
| 2D Classification | nuScenes | mAAE | 0.12 | EA-LSS |
| 2D Classification | nuScenes | mAOE | 0.28 | EA-LSS |
| 2D Classification | nuScenes | mAP | 0.77 | EA-LSS |
| 2D Classification | nuScenes | mASE | 0.21 | EA-LSS |
| 2D Classification | nuScenes | mATE | 0.23 | EA-LSS |
| 2D Classification | nuScenes | mAVE | 0.2 | EA-LSS |
| 2D Object Detection | nuScenes | NDS | 0.78 | EA-LSS |
| 2D Object Detection | nuScenes | mAAE | 0.12 | EA-LSS |
| 2D Object Detection | nuScenes | mAOE | 0.28 | EA-LSS |
| 2D Object Detection | nuScenes | mAP | 0.77 | EA-LSS |
| 2D Object Detection | nuScenes | mASE | 0.21 | EA-LSS |
| 2D Object Detection | nuScenes | mATE | 0.23 | EA-LSS |
| 2D Object Detection | nuScenes | mAVE | 0.2 | EA-LSS |
| 16k | nuScenes | NDS | 0.78 | EA-LSS |
| 16k | nuScenes | mAAE | 0.12 | EA-LSS |
| 16k | nuScenes | mAOE | 0.28 | EA-LSS |
| 16k | nuScenes | mAP | 0.77 | EA-LSS |
| 16k | nuScenes | mASE | 0.21 | EA-LSS |
| 16k | nuScenes | mATE | 0.23 | EA-LSS |
| 16k | nuScenes | mAVE | 0.2 | EA-LSS |