Jiaming Sun, Linghao Chen, Yiming Xie, Siyu Zhang, Qinhong Jiang, Xiaowei Zhou, Hujun Bao
In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering a point cloud with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Pose Estimation | KITTI Cars Hard | Average Orientation Similarity | 67.16 | Disp-RCNN (Stereo) |
| Object Detection | KITTI Cars Moderate | AP75 | 45.78 | Disp R-CNN |
| Object Detection | KITTI Cyclists Moderate | AP50 | 24.4 | Disp R-CNN |
| Object Detection | KITTI Pedestrians Moderate | AP50 | 25.8 | Disp R-CNN |
| 3D | KITTI Cars Moderate | AP75 | 45.78 | Disp R-CNN |
| 3D | KITTI Cyclists Moderate | AP50 | 24.4 | Disp R-CNN |
| 3D | KITTI Pedestrians Moderate | AP50 | 25.8 | Disp R-CNN |
| 3D | KITTI Cars Hard | Average Orientation Similarity | 67.16 | Disp-RCNN (Stereo) |
| 3D Object Detection | KITTI Cars Moderate | AP75 | 45.78 | Disp R-CNN |
| 3D Object Detection | KITTI Cyclists Moderate | AP50 | 24.4 | Disp R-CNN |
| 3D Object Detection | KITTI Pedestrians Moderate | AP50 | 25.8 | Disp R-CNN |
| 2D Classification | KITTI Cars Moderate | AP75 | 45.78 | Disp R-CNN |
| 2D Classification | KITTI Cyclists Moderate | AP50 | 24.4 | Disp R-CNN |
| 2D Classification | KITTI Pedestrians Moderate | AP50 | 25.8 | Disp R-CNN |
| 2D Object Detection | KITTI Cars Moderate | AP75 | 45.78 | Disp R-CNN |
| 2D Object Detection | KITTI Cyclists Moderate | AP50 | 24.4 | Disp R-CNN |
| 2D Object Detection | KITTI Pedestrians Moderate | AP50 | 25.8 | Disp R-CNN |
| 1 Image, 2*2 Stitchi | KITTI Cars Hard | Average Orientation Similarity | 67.16 | Disp-RCNN (Stereo) |
| 16k | KITTI Cars Moderate | AP75 | 45.78 | Disp R-CNN |
| 16k | KITTI Cyclists Moderate | AP50 | 24.4 | Disp R-CNN |
| 16k | KITTI Pedestrians Moderate | AP50 | 25.8 | Disp R-CNN |