Mingxing Tan, Ruoming Pang, Quoc V. Le
Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multiscale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and better backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single model and single-scale, our EfficientDet-D7 achieves state-of-the-art 55.1 AP on COCO test-dev with 77M parameters and 410B FLOPs, being 4x - 9x smaller and using 13x - 42x fewer FLOPs than previous detectors. Code is available at https://github.com/google/automl/tree/master/efficientdet.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO test-dev | AP50 | 71.6 | EfficientDet-D7 (1536) |
| Object Detection | COCO test-dev | AP75 | 56.9 | EfficientDet-D7 (1536) |
| Object Detection | COCO test-dev | box mAP | 52.6 | EfficientDet-D7 (1536) |
| Object Detection | COCO-O | Average mAP | 28.5 | EfficientDet-D5 (EfficientNet-B5) |
| Object Detection | COCO-O | Effective Robustness | 5.44 | EfficientDet-D5 (EfficientNet-B5) |
| Object Detection | COCO minival | box AP | 52.1 | EfficientDet-D7 (1536) |
| Object Detection | COCO minival | AP50 | 73.4 | EfficientDet-D7x (single-scale) |
| Object Detection | COCO minival | AP75 | 59 | EfficientDet-D7x (single-scale) |
| Object Detection | COCO minival | APL | 67.9 | EfficientDet-D7x (single-scale) |
| Object Detection | COCO minival | APM | 58 | EfficientDet-D7x (single-scale) |
| Object Detection | COCO minival | APS | 40 | EfficientDet-D7x (single-scale) |
| 3D | COCO test-dev | AP50 | 71.6 | EfficientDet-D7 (1536) |
| 3D | COCO test-dev | AP75 | 56.9 | EfficientDet-D7 (1536) |
| 3D | COCO test-dev | box mAP | 52.6 | EfficientDet-D7 (1536) |
| 3D | COCO-O | Average mAP | 28.5 | EfficientDet-D5 (EfficientNet-B5) |
| 3D | COCO-O | Effective Robustness | 5.44 | EfficientDet-D5 (EfficientNet-B5) |
| 3D | COCO minival | box AP | 52.1 | EfficientDet-D7 (1536) |
| 3D | COCO minival | AP50 | 73.4 | EfficientDet-D7x (single-scale) |
| 3D | COCO minival | AP75 | 59 | EfficientDet-D7x (single-scale) |
| 3D | COCO minival | APL | 67.9 | EfficientDet-D7x (single-scale) |
| 3D | COCO minival | APM | 58 | EfficientDet-D7x (single-scale) |
| 3D | COCO minival | APS | 40 | EfficientDet-D7x (single-scale) |
| 2D Classification | COCO test-dev | AP50 | 71.6 | EfficientDet-D7 (1536) |
| 2D Classification | COCO test-dev | AP75 | 56.9 | EfficientDet-D7 (1536) |
| 2D Classification | COCO test-dev | box mAP | 52.6 | EfficientDet-D7 (1536) |
| 2D Classification | COCO-O | Average mAP | 28.5 | EfficientDet-D5 (EfficientNet-B5) |
| 2D Classification | COCO-O | Effective Robustness | 5.44 | EfficientDet-D5 (EfficientNet-B5) |
| 2D Classification | COCO minival | box AP | 52.1 | EfficientDet-D7 (1536) |
| 2D Classification | COCO minival | AP50 | 73.4 | EfficientDet-D7x (single-scale) |
| 2D Classification | COCO minival | AP75 | 59 | EfficientDet-D7x (single-scale) |
| 2D Classification | COCO minival | APL | 67.9 | EfficientDet-D7x (single-scale) |
| 2D Classification | COCO minival | APM | 58 | EfficientDet-D7x (single-scale) |
| 2D Classification | COCO minival | APS | 40 | EfficientDet-D7x (single-scale) |
| 2D Object Detection | COCO test-dev | AP50 | 71.6 | EfficientDet-D7 (1536) |
| 2D Object Detection | COCO test-dev | AP75 | 56.9 | EfficientDet-D7 (1536) |
| 2D Object Detection | COCO test-dev | box mAP | 52.6 | EfficientDet-D7 (1536) |
| 2D Object Detection | COCO-O | Average mAP | 28.5 | EfficientDet-D5 (EfficientNet-B5) |
| 2D Object Detection | COCO-O | Effective Robustness | 5.44 | EfficientDet-D5 (EfficientNet-B5) |
| 2D Object Detection | COCO minival | box AP | 52.1 | EfficientDet-D7 (1536) |
| 2D Object Detection | COCO minival | AP50 | 73.4 | EfficientDet-D7x (single-scale) |
| 2D Object Detection | COCO minival | AP75 | 59 | EfficientDet-D7x (single-scale) |
| 2D Object Detection | COCO minival | APL | 67.9 | EfficientDet-D7x (single-scale) |
| 2D Object Detection | COCO minival | APM | 58 | EfficientDet-D7x (single-scale) |
| 2D Object Detection | COCO minival | APS | 40 | EfficientDet-D7x (single-scale) |
| 16k | COCO test-dev | AP50 | 71.6 | EfficientDet-D7 (1536) |
| 16k | COCO test-dev | AP75 | 56.9 | EfficientDet-D7 (1536) |
| 16k | COCO test-dev | box mAP | 52.6 | EfficientDet-D7 (1536) |
| 16k | COCO-O | Average mAP | 28.5 | EfficientDet-D5 (EfficientNet-B5) |
| 16k | COCO-O | Effective Robustness | 5.44 | EfficientDet-D5 (EfficientNet-B5) |
| 16k | COCO minival | box AP | 52.1 | EfficientDet-D7 (1536) |
| 16k | COCO minival | AP50 | 73.4 | EfficientDet-D7x (single-scale) |
| 16k | COCO minival | AP75 | 59 | EfficientDet-D7x (single-scale) |
| 16k | COCO minival | APL | 67.9 | EfficientDet-D7x (single-scale) |
| 16k | COCO minival | APM | 58 | EfficientDet-D7x (single-scale) |
| 16k | COCO minival | APS | 40 | EfficientDet-D7x (single-scale) |