Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan, Masayoshi Tomizuka, Lei LI, Zehuan Yuan, Changhu Wang, Ping Luo
We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object detection heavily rely on dense object candidates, such as $k$ anchor boxes pre-defined on all grids of image feature map of size $H\times W$. In our method, however, a fixed sparse set of learned object proposals, total length of $N$, are provided to object recognition head to perform classification and location. By eliminating $HWk$ (up to hundreds of thousands) hand-designed object candidates to $N$ (e.g. 100) learnable proposals, Sparse R-CNN completely avoids all efforts related to object candidates design and many-to-one label assignment. More importantly, final predictions are directly output without non-maximum suppression post-procedure. Sparse R-CNN demonstrates accuracy, run-time and training convergence performance on par with the well-established detector baselines on the challenging COCO dataset, e.g., achieving 45.0 AP in standard $3\times$ training schedule and running at 22 fps using ResNet-50 FPN model. We hope our work could inspire re-thinking the convention of dense prior in object detectors. The code is available at: https://github.com/PeizeSun/SparseR-CNN.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO minival | AP50 | 64.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | AP75 | 49.5 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | APL | 61.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | APM | 48.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | APS | 28.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | box AP | 45.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | AP50 | 63.4 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | AP75 | 48.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | APL | 59.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | APM | 47.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | APS | 26.9 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | box AP | 44.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| Object Detection | COCO minival | AP50 | 62.1 | Sparse R-CNN (ResNet-101, FPN) |
| Object Detection | COCO minival | AP75 | 47.2 | Sparse R-CNN (ResNet-101, FPN) |
| Object Detection | COCO minival | APL | 59.7 | Sparse R-CNN (ResNet-101, FPN) |
| Object Detection | COCO minival | APM | 46.3 | Sparse R-CNN (ResNet-101, FPN) |
| Object Detection | COCO minival | APS | 26.1 | Sparse R-CNN (ResNet-101, FPN) |
| Object Detection | COCO minival | box AP | 43.5 | Sparse R-CNN (ResNet-101, FPN) |
| Object Detection | COCO minival | AP50 | 61.2 | Sparse R-CNN (ResNet-50, FPN) |
| Object Detection | COCO minival | AP75 | 45.7 | Sparse R-CNN (ResNet-50, FPN) |
| Object Detection | COCO minival | APL | 57.6 | Sparse R-CNN (ResNet-50, FPN) |
| Object Detection | COCO minival | APM | 44.6 | Sparse R-CNN (ResNet-50, FPN) |
| Object Detection | COCO minival | APS | 26.7 | Sparse R-CNN (ResNet-50, FPN) |
| Object Detection | COCO minival | box AP | 42.3 | Sparse R-CNN (ResNet-50, FPN) |
| 3D | COCO minival | AP50 | 64.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | AP75 | 49.5 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | APL | 61.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | APM | 48.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | APS | 28.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | box AP | 45.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | AP50 | 63.4 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | AP75 | 48.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | APL | 59.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | APM | 47.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | APS | 26.9 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | box AP | 44.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 3D | COCO minival | AP50 | 62.1 | Sparse R-CNN (ResNet-101, FPN) |
| 3D | COCO minival | AP75 | 47.2 | Sparse R-CNN (ResNet-101, FPN) |
| 3D | COCO minival | APL | 59.7 | Sparse R-CNN (ResNet-101, FPN) |
| 3D | COCO minival | APM | 46.3 | Sparse R-CNN (ResNet-101, FPN) |
| 3D | COCO minival | APS | 26.1 | Sparse R-CNN (ResNet-101, FPN) |
| 3D | COCO minival | box AP | 43.5 | Sparse R-CNN (ResNet-101, FPN) |
| 3D | COCO minival | AP50 | 61.2 | Sparse R-CNN (ResNet-50, FPN) |
| 3D | COCO minival | AP75 | 45.7 | Sparse R-CNN (ResNet-50, FPN) |
| 3D | COCO minival | APL | 57.6 | Sparse R-CNN (ResNet-50, FPN) |
| 3D | COCO minival | APM | 44.6 | Sparse R-CNN (ResNet-50, FPN) |
| 3D | COCO minival | APS | 26.7 | Sparse R-CNN (ResNet-50, FPN) |
| 3D | COCO minival | box AP | 42.3 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Classification | COCO minival | AP50 | 64.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | AP75 | 49.5 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | APL | 61.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | APM | 48.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | APS | 28.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | box AP | 45.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | AP50 | 63.4 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | AP75 | 48.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | APL | 59.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | APM | 47.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | APS | 26.9 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | box AP | 44.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Classification | COCO minival | AP50 | 62.1 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Classification | COCO minival | AP75 | 47.2 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Classification | COCO minival | APL | 59.7 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Classification | COCO minival | APM | 46.3 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Classification | COCO minival | APS | 26.1 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Classification | COCO minival | box AP | 43.5 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Classification | COCO minival | AP50 | 61.2 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Classification | COCO minival | AP75 | 45.7 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Classification | COCO minival | APL | 57.6 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Classification | COCO minival | APM | 44.6 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Classification | COCO minival | APS | 26.7 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Classification | COCO minival | box AP | 42.3 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Object Detection | SARDet-100K | box mAP | 38.1 | Sparse R-CNN |
| 2D Object Detection | CeyMo | mAP | 47.3 | Sparse R-CNN |
| 2D Object Detection | COCO minival | AP50 | 64.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | AP75 | 49.5 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | APL | 61.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | APM | 48.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | APS | 28.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | box AP | 45.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | AP50 | 63.4 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | AP75 | 48.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | APL | 59.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | APM | 47.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | APS | 26.9 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | box AP | 44.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 2D Object Detection | COCO minival | AP50 | 62.1 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Object Detection | COCO minival | AP75 | 47.2 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Object Detection | COCO minival | APL | 59.7 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Object Detection | COCO minival | APM | 46.3 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Object Detection | COCO minival | APS | 26.1 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Object Detection | COCO minival | box AP | 43.5 | Sparse R-CNN (ResNet-101, FPN) |
| 2D Object Detection | COCO minival | AP50 | 61.2 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Object Detection | COCO minival | AP75 | 45.7 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Object Detection | COCO minival | APL | 57.6 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Object Detection | COCO minival | APM | 44.6 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Object Detection | COCO minival | APS | 26.7 | Sparse R-CNN (ResNet-50, FPN) |
| 2D Object Detection | COCO minival | box AP | 42.3 | Sparse R-CNN (ResNet-50, FPN) |
| 16k | COCO minival | AP50 | 64.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | AP75 | 49.5 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | APL | 61.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | APM | 48.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | APS | 28.3 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | box AP | 45.6 | Sparse R-CNN (ResNet-101, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | AP50 | 63.4 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | AP75 | 48.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | APL | 59.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | APM | 47.2 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | APS | 26.9 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | box AP | 44.5 | Sparse R-CNN (ResNet-50, learnable proposals, random crop aug, FPN) |
| 16k | COCO minival | AP50 | 62.1 | Sparse R-CNN (ResNet-101, FPN) |
| 16k | COCO minival | AP75 | 47.2 | Sparse R-CNN (ResNet-101, FPN) |
| 16k | COCO minival | APL | 59.7 | Sparse R-CNN (ResNet-101, FPN) |
| 16k | COCO minival | APM | 46.3 | Sparse R-CNN (ResNet-101, FPN) |
| 16k | COCO minival | APS | 26.1 | Sparse R-CNN (ResNet-101, FPN) |
| 16k | COCO minival | box AP | 43.5 | Sparse R-CNN (ResNet-101, FPN) |
| 16k | COCO minival | AP50 | 61.2 | Sparse R-CNN (ResNet-50, FPN) |
| 16k | COCO minival | AP75 | 45.7 | Sparse R-CNN (ResNet-50, FPN) |
| 16k | COCO minival | APL | 57.6 | Sparse R-CNN (ResNet-50, FPN) |
| 16k | COCO minival | APM | 44.6 | Sparse R-CNN (ResNet-50, FPN) |
| 16k | COCO minival | APS | 26.7 | Sparse R-CNN (ResNet-50, FPN) |
| 16k | COCO minival | box AP | 42.3 | Sparse R-CNN (ResNet-50, FPN) |