Thang Vu, Hyunjun Jang, Trung X. Pham, Chang D. Yoo
This paper considers an architecture referred to as Cascade Region Proposal Network (Cascade RPN) for improving the region-proposal quality and detection performance by \textit{systematically} addressing the limitation of the conventional RPN that \textit{heuristically defines} the anchors and \textit{aligns} the features to the anchors. First, instead of using multiple anchors with predefined scales and aspect ratios, Cascade RPN relies on a \textit{single anchor} per location and performs multi-stage refinement. Each stage is progressively more stringent in defining positive samples by starting out with an anchor-free metric followed by anchor-based metrics in the ensuing stages. Second, to attain alignment between the features and the anchors throughout the stages, \textit{adaptive convolution} is proposed that takes the anchors in addition to the image features as its input and learns the sampled features guided by the anchors. A simple implementation of a two-stage Cascade RPN achieves AR 13.4 points higher than that of the conventional RPN, surpassing any existing region proposal methods. When adopting to Fast R-CNN and Faster R-CNN, Cascade RPN can improve the detection mAP by 3.1 and 3.5 points, respectively. The code is made publicly available at \url{https://github.com/thangvubk/Cascade-RPN.git}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO test-dev | AP50 | 58.9 | Faster R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | AP75 | 44.5 | Faster R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | APL | 52.6 | Faster R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | APM | 42.8 | Faster R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | APS | 22 | Faster R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | box mAP | 40.6 | Faster R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | AP50 | 59.4 | Fast R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | AP75 | 43.8 | Fast R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | APL | 51.6 | Fast R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | APM | 42.4 | Fast R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | APS | 22.1 | Fast R-CNN (Cascade RPN) |
| Object Detection | COCO test-dev | box mAP | 40.1 | Fast R-CNN (Cascade RPN) |
| 3D | COCO test-dev | AP50 | 58.9 | Faster R-CNN (Cascade RPN) |
| 3D | COCO test-dev | AP75 | 44.5 | Faster R-CNN (Cascade RPN) |
| 3D | COCO test-dev | APL | 52.6 | Faster R-CNN (Cascade RPN) |
| 3D | COCO test-dev | APM | 42.8 | Faster R-CNN (Cascade RPN) |
| 3D | COCO test-dev | APS | 22 | Faster R-CNN (Cascade RPN) |
| 3D | COCO test-dev | box mAP | 40.6 | Faster R-CNN (Cascade RPN) |
| 3D | COCO test-dev | AP50 | 59.4 | Fast R-CNN (Cascade RPN) |
| 3D | COCO test-dev | AP75 | 43.8 | Fast R-CNN (Cascade RPN) |
| 3D | COCO test-dev | APL | 51.6 | Fast R-CNN (Cascade RPN) |
| 3D | COCO test-dev | APM | 42.4 | Fast R-CNN (Cascade RPN) |
| 3D | COCO test-dev | APS | 22.1 | Fast R-CNN (Cascade RPN) |
| 3D | COCO test-dev | box mAP | 40.1 | Fast R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | AP50 | 58.9 | Faster R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | AP75 | 44.5 | Faster R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | APL | 52.6 | Faster R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | APM | 42.8 | Faster R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | APS | 22 | Faster R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | box mAP | 40.6 | Faster R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | AP50 | 59.4 | Fast R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | AP75 | 43.8 | Fast R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | APL | 51.6 | Fast R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | APM | 42.4 | Fast R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | APS | 22.1 | Fast R-CNN (Cascade RPN) |
| 2D Classification | COCO test-dev | box mAP | 40.1 | Fast R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | AP50 | 58.9 | Faster R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | AP75 | 44.5 | Faster R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | APL | 52.6 | Faster R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | APM | 42.8 | Faster R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | APS | 22 | Faster R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | box mAP | 40.6 | Faster R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | AP50 | 59.4 | Fast R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | AP75 | 43.8 | Fast R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | APL | 51.6 | Fast R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | APM | 42.4 | Fast R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | APS | 22.1 | Fast R-CNN (Cascade RPN) |
| 2D Object Detection | COCO test-dev | box mAP | 40.1 | Fast R-CNN (Cascade RPN) |
| 16k | COCO test-dev | AP50 | 58.9 | Faster R-CNN (Cascade RPN) |
| 16k | COCO test-dev | AP75 | 44.5 | Faster R-CNN (Cascade RPN) |
| 16k | COCO test-dev | APL | 52.6 | Faster R-CNN (Cascade RPN) |
| 16k | COCO test-dev | APM | 42.8 | Faster R-CNN (Cascade RPN) |
| 16k | COCO test-dev | APS | 22 | Faster R-CNN (Cascade RPN) |
| 16k | COCO test-dev | box mAP | 40.6 | Faster R-CNN (Cascade RPN) |
| 16k | COCO test-dev | AP50 | 59.4 | Fast R-CNN (Cascade RPN) |
| 16k | COCO test-dev | AP75 | 43.8 | Fast R-CNN (Cascade RPN) |
| 16k | COCO test-dev | APL | 51.6 | Fast R-CNN (Cascade RPN) |
| 16k | COCO test-dev | APM | 42.4 | Fast R-CNN (Cascade RPN) |
| 16k | COCO test-dev | APS | 22.1 | Fast R-CNN (Cascade RPN) |
| 16k | COCO test-dev | box mAP | 40.1 | Fast R-CNN (Cascade RPN) |