Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Counting | CARPK | MAE | 39.88 | Faster R-CNN (2015) |
| Object Counting | CARPK | RMSE | 47.67 | Faster R-CNN (2015) |
| Object Detection | COCO-O | Average mAP | 16.4 | Faster R-CNN (ResNet-50-FPN) |
| Object Detection | COCO-O | Effective Robustness | -0.41 | Faster R-CNN (ResNet-50-FPN) |
| Object Detection | PKU-DDD17-Car | mAP50 | 80.2 | Faster-RCNN |
| Object Detection | UA-DETRAC | mAP | 58.45 | Faster R-CNN |
| Object Detection | PASCAL VOC 2007 (15+5) | FPS | 7 | Faster R-CNN |
| Object Detection | PASCAL VOC 2007 (15+5) | MAP | 73.2 | Faster R-CNN |
| Object Detection | Cityscapes | mPC [AP] | 15.4 | Baseline |
| 3D | COCO-O | Average mAP | 16.4 | Faster R-CNN (ResNet-50-FPN) |
| 3D | COCO-O | Effective Robustness | -0.41 | Faster R-CNN (ResNet-50-FPN) |
| 3D | PKU-DDD17-Car | mAP50 | 80.2 | Faster-RCNN |
| 3D | UA-DETRAC | mAP | 58.45 | Faster R-CNN |
| 3D | PASCAL VOC 2007 (15+5) | FPS | 7 | Faster R-CNN |
| 3D | PASCAL VOC 2007 (15+5) | MAP | 73.2 | Faster R-CNN |
| 3D | Cityscapes | mPC [AP] | 15.4 | Baseline |
| 2D Classification | COCO-O | Average mAP | 16.4 | Faster R-CNN (ResNet-50-FPN) |
| 2D Classification | COCO-O | Effective Robustness | -0.41 | Faster R-CNN (ResNet-50-FPN) |
| 2D Classification | PKU-DDD17-Car | mAP50 | 80.2 | Faster-RCNN |
| 2D Classification | UA-DETRAC | mAP | 58.45 | Faster R-CNN |
| 2D Classification | PASCAL VOC 2007 (15+5) | FPS | 7 | Faster R-CNN |
| 2D Classification | PASCAL VOC 2007 (15+5) | MAP | 73.2 | Faster R-CNN |
| 2D Classification | Cityscapes | mPC [AP] | 15.4 | Baseline |
| 2D Object Detection | SARDet-100K | box mAP | 49 | F-RCNN |
| 2D Object Detection | COCO-O | Average mAP | 16.4 | Faster R-CNN (ResNet-50-FPN) |
| 2D Object Detection | COCO-O | Effective Robustness | -0.41 | Faster R-CNN (ResNet-50-FPN) |
| 2D Object Detection | PKU-DDD17-Car | mAP50 | 80.2 | Faster-RCNN |
| 2D Object Detection | UA-DETRAC | mAP | 58.45 | Faster R-CNN |
| 2D Object Detection | PASCAL VOC 2007 (15+5) | FPS | 7 | Faster R-CNN |
| 2D Object Detection | PASCAL VOC 2007 (15+5) | MAP | 73.2 | Faster R-CNN |
| 2D Object Detection | Cityscapes | mPC [AP] | 15.4 | Baseline |
| 16k | COCO-O | Average mAP | 16.4 | Faster R-CNN (ResNet-50-FPN) |
| 16k | COCO-O | Effective Robustness | -0.41 | Faster R-CNN (ResNet-50-FPN) |
| 16k | PKU-DDD17-Car | mAP50 | 80.2 | Faster-RCNN |
| 16k | UA-DETRAC | mAP | 58.45 | Faster R-CNN |
| 16k | PASCAL VOC 2007 (15+5) | FPS | 7 | Faster R-CNN |
| 16k | PASCAL VOC 2007 (15+5) | MAP | 73.2 | Faster R-CNN |
| 16k | Cityscapes | mPC [AP] | 15.4 | Baseline |