Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár
The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | Trillion Pairs Dataset | Accuracy | 37.14 | F-Softmax |
| Facial Recognition and Modelling | Trillion Pairs Dataset | Accuracy | 39.8 | F-Softmax |
| Autonomous Vehicles | TJU-Ped-traffic | ALL (miss rate) | 41.4 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-traffic | HO (miss rate) | 61.6 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-traffic | R (miss rate) | 23.89 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-traffic | R+HO (miss rate) | 28.45 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-traffic | RS (miss rate) | 37.92 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-campus | ALL (miss rate) | 44.34 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-campus | HO (miss rate) | 71.31 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-campus | R (miss rate) | 34.73 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-campus | R+HO (miss rate) | 42.26 | RetinaNet |
| Autonomous Vehicles | TJU-Ped-campus | RS (miss rate) | 82.99 | RetinaNet |
| Object Counting | CARPK | MAE | 24.58 | RetinaNet (2018) |
| Object Detection | COCO test-dev | AP50 | 61.1 | RetinaNet (ResNeXt-101-FPN) |
| Object Detection | COCO test-dev | AP75 | 44.1 | RetinaNet (ResNeXt-101-FPN) |
| Object Detection | COCO test-dev | APL | 51.2 | RetinaNet (ResNeXt-101-FPN) |
| Object Detection | COCO test-dev | APM | 44.2 | RetinaNet (ResNeXt-101-FPN) |
| Object Detection | COCO test-dev | APS | 24.1 | RetinaNet (ResNeXt-101-FPN) |
| Object Detection | COCO test-dev | box mAP | 40.8 | RetinaNet (ResNeXt-101-FPN) |
| Object Detection | COCO test-dev | AP50 | 59.1 | RetinaNet (ResNet-101-FPN) |
| Object Detection | COCO test-dev | AP75 | 42.3 | RetinaNet (ResNet-101-FPN) |
| Object Detection | COCO test-dev | APL | 50.2 | RetinaNet (ResNet-101-FPN) |
| Object Detection | COCO test-dev | APM | 42.7 | RetinaNet (ResNet-101-FPN) |
| Object Detection | COCO test-dev | APS | 21.8 | RetinaNet (ResNet-101-FPN) |
| Object Detection | COCO test-dev | box mAP | 39.1 | RetinaNet (ResNet-101-FPN) |
| Object Detection | COCO-O | Average mAP | 16.6 | RetinaNet (ResNet-50) |
| Object Detection | COCO-O | Effective Robustness | 0.18 | RetinaNet (ResNet-50) |
| Object Detection | SKU-110K | AP | 45.5 | RetinaNet |
| Object Detection | SKU-110K | AP75 | 0.389 | RetinaNet |
| Image Classification | COCO-MLT | Average mAP | 49.46 | Focal Loss(ResNet-50) |
| Image Classification | VOC-MLT | Average mAP | 73.88 | Focal Loss(ResNet-50) |
| Image Classification | EGTEA | Average Precision | 59.09 | Focal loss (3D- ResNeXt101) |
| Image Classification | EGTEA | Average Recall | 59.17 | Focal loss (3D- ResNeXt101) |
| Face Verification | Trillion Pairs Dataset | Accuracy | 37.14 | F-Softmax |
| Face Reconstruction | Trillion Pairs Dataset | Accuracy | 37.14 | F-Softmax |
| Face Reconstruction | Trillion Pairs Dataset | Accuracy | 39.8 | F-Softmax |
| 3D | COCO test-dev | AP50 | 61.1 | RetinaNet (ResNeXt-101-FPN) |
| 3D | COCO test-dev | AP75 | 44.1 | RetinaNet (ResNeXt-101-FPN) |
| 3D | COCO test-dev | APL | 51.2 | RetinaNet (ResNeXt-101-FPN) |
| 3D | COCO test-dev | APM | 44.2 | RetinaNet (ResNeXt-101-FPN) |
| 3D | COCO test-dev | APS | 24.1 | RetinaNet (ResNeXt-101-FPN) |
| 3D | COCO test-dev | box mAP | 40.8 | RetinaNet (ResNeXt-101-FPN) |
| 3D | COCO test-dev | AP50 | 59.1 | RetinaNet (ResNet-101-FPN) |
| 3D | COCO test-dev | AP75 | 42.3 | RetinaNet (ResNet-101-FPN) |
| 3D | COCO test-dev | APL | 50.2 | RetinaNet (ResNet-101-FPN) |
| 3D | COCO test-dev | APM | 42.7 | RetinaNet (ResNet-101-FPN) |
| 3D | COCO test-dev | APS | 21.8 | RetinaNet (ResNet-101-FPN) |
| 3D | COCO test-dev | box mAP | 39.1 | RetinaNet (ResNet-101-FPN) |
| 3D | COCO-O | Average mAP | 16.6 | RetinaNet (ResNet-50) |
| 3D | COCO-O | Effective Robustness | 0.18 | RetinaNet (ResNet-50) |
| 3D | SKU-110K | AP | 45.5 | RetinaNet |
| 3D | SKU-110K | AP75 | 0.389 | RetinaNet |
| 3D | Trillion Pairs Dataset | Accuracy | 37.14 | F-Softmax |
| 3D | Trillion Pairs Dataset | Accuracy | 39.8 | F-Softmax |
| 3D Face Modelling | Trillion Pairs Dataset | Accuracy | 37.14 | F-Softmax |
| 3D Face Modelling | Trillion Pairs Dataset | Accuracy | 39.8 | F-Softmax |
| Few-Shot Image Classification | COCO-MLT | Average mAP | 49.46 | Focal Loss(ResNet-50) |
| Few-Shot Image Classification | VOC-MLT | Average mAP | 73.88 | Focal Loss(ResNet-50) |
| Few-Shot Image Classification | EGTEA | Average Precision | 59.09 | Focal loss (3D- ResNeXt101) |
| Few-Shot Image Classification | EGTEA | Average Recall | 59.17 | Focal loss (3D- ResNeXt101) |
| 3D Face Reconstruction | Trillion Pairs Dataset | Accuracy | 37.14 | F-Softmax |
| 3D Face Reconstruction | Trillion Pairs Dataset | Accuracy | 39.8 | F-Softmax |
| Generalized Few-Shot Classification | COCO-MLT | Average mAP | 49.46 | Focal Loss(ResNet-50) |
| Generalized Few-Shot Classification | VOC-MLT | Average mAP | 73.88 | Focal Loss(ResNet-50) |
| Generalized Few-Shot Classification | EGTEA | Average Precision | 59.09 | Focal loss (3D- ResNeXt101) |
| Generalized Few-Shot Classification | EGTEA | Average Recall | 59.17 | Focal loss (3D- ResNeXt101) |
| Long-tail Learning | COCO-MLT | Average mAP | 49.46 | Focal Loss(ResNet-50) |
| Long-tail Learning | VOC-MLT | Average mAP | 73.88 | Focal Loss(ResNet-50) |
| Long-tail Learning | EGTEA | Average Precision | 59.09 | Focal loss (3D- ResNeXt101) |
| Long-tail Learning | EGTEA | Average Recall | 59.17 | Focal loss (3D- ResNeXt101) |
| Generalized Few-Shot Learning | COCO-MLT | Average mAP | 49.46 | Focal Loss(ResNet-50) |
| Generalized Few-Shot Learning | VOC-MLT | Average mAP | 73.88 | Focal Loss(ResNet-50) |
| Generalized Few-Shot Learning | EGTEA | Average Precision | 59.09 | Focal loss (3D- ResNeXt101) |
| Generalized Few-Shot Learning | EGTEA | Average Recall | 59.17 | Focal loss (3D- ResNeXt101) |
| 2D Classification | COCO test-dev | AP50 | 61.1 | RetinaNet (ResNeXt-101-FPN) |
| 2D Classification | COCO test-dev | AP75 | 44.1 | RetinaNet (ResNeXt-101-FPN) |
| 2D Classification | COCO test-dev | APL | 51.2 | RetinaNet (ResNeXt-101-FPN) |
| 2D Classification | COCO test-dev | APM | 44.2 | RetinaNet (ResNeXt-101-FPN) |
| 2D Classification | COCO test-dev | APS | 24.1 | RetinaNet (ResNeXt-101-FPN) |
| 2D Classification | COCO test-dev | box mAP | 40.8 | RetinaNet (ResNeXt-101-FPN) |
| 2D Classification | COCO test-dev | AP50 | 59.1 | RetinaNet (ResNet-101-FPN) |
| 2D Classification | COCO test-dev | AP75 | 42.3 | RetinaNet (ResNet-101-FPN) |
| 2D Classification | COCO test-dev | APL | 50.2 | RetinaNet (ResNet-101-FPN) |
| 2D Classification | COCO test-dev | APM | 42.7 | RetinaNet (ResNet-101-FPN) |
| 2D Classification | COCO test-dev | APS | 21.8 | RetinaNet (ResNet-101-FPN) |
| 2D Classification | COCO test-dev | box mAP | 39.1 | RetinaNet (ResNet-101-FPN) |
| 2D Classification | COCO-O | Average mAP | 16.6 | RetinaNet (ResNet-50) |
| 2D Classification | COCO-O | Effective Robustness | 0.18 | RetinaNet (ResNet-50) |
| 2D Classification | SKU-110K | AP | 45.5 | RetinaNet |
| 2D Classification | SKU-110K | AP75 | 0.389 | RetinaNet |
| Pedestrian Detection | TJU-Ped-traffic | ALL (miss rate) | 41.4 | RetinaNet |
| Pedestrian Detection | TJU-Ped-traffic | HO (miss rate) | 61.6 | RetinaNet |
| Pedestrian Detection | TJU-Ped-traffic | R (miss rate) | 23.89 | RetinaNet |
| Pedestrian Detection | TJU-Ped-traffic | R+HO (miss rate) | 28.45 | RetinaNet |
| Pedestrian Detection | TJU-Ped-traffic | RS (miss rate) | 37.92 | RetinaNet |
| Pedestrian Detection | TJU-Ped-campus | ALL (miss rate) | 44.34 | RetinaNet |
| Pedestrian Detection | TJU-Ped-campus | HO (miss rate) | 71.31 | RetinaNet |
| Pedestrian Detection | TJU-Ped-campus | R (miss rate) | 34.73 | RetinaNet |
| Pedestrian Detection | TJU-Ped-campus | R+HO (miss rate) | 42.26 | RetinaNet |
| Pedestrian Detection | TJU-Ped-campus | RS (miss rate) | 82.99 | RetinaNet |
| 2D Object Detection | SARDet-100K | box mAP | 47.4 | RetinaNet |
| 2D Object Detection | COCO test-dev | AP50 | 61.1 | RetinaNet (ResNeXt-101-FPN) |
| 2D Object Detection | COCO test-dev | AP75 | 44.1 | RetinaNet (ResNeXt-101-FPN) |
| 2D Object Detection | COCO test-dev | APL | 51.2 | RetinaNet (ResNeXt-101-FPN) |
| 2D Object Detection | COCO test-dev | APM | 44.2 | RetinaNet (ResNeXt-101-FPN) |
| 2D Object Detection | COCO test-dev | APS | 24.1 | RetinaNet (ResNeXt-101-FPN) |
| 2D Object Detection | COCO test-dev | box mAP | 40.8 | RetinaNet (ResNeXt-101-FPN) |
| 2D Object Detection | COCO test-dev | AP50 | 59.1 | RetinaNet (ResNet-101-FPN) |
| 2D Object Detection | COCO test-dev | AP75 | 42.3 | RetinaNet (ResNet-101-FPN) |
| 2D Object Detection | COCO test-dev | APL | 50.2 | RetinaNet (ResNet-101-FPN) |
| 2D Object Detection | COCO test-dev | APM | 42.7 | RetinaNet (ResNet-101-FPN) |
| 2D Object Detection | COCO test-dev | APS | 21.8 | RetinaNet (ResNet-101-FPN) |
| 2D Object Detection | COCO test-dev | box mAP | 39.1 | RetinaNet (ResNet-101-FPN) |
| 2D Object Detection | COCO-O | Average mAP | 16.6 | RetinaNet (ResNet-50) |
| 2D Object Detection | COCO-O | Effective Robustness | 0.18 | RetinaNet (ResNet-50) |
| 2D Object Detection | SKU-110K | AP | 45.5 | RetinaNet |
| 2D Object Detection | SKU-110K | AP75 | 0.389 | RetinaNet |
| 16k | COCO test-dev | AP50 | 61.1 | RetinaNet (ResNeXt-101-FPN) |
| 16k | COCO test-dev | AP75 | 44.1 | RetinaNet (ResNeXt-101-FPN) |
| 16k | COCO test-dev | APL | 51.2 | RetinaNet (ResNeXt-101-FPN) |
| 16k | COCO test-dev | APM | 44.2 | RetinaNet (ResNeXt-101-FPN) |
| 16k | COCO test-dev | APS | 24.1 | RetinaNet (ResNeXt-101-FPN) |
| 16k | COCO test-dev | box mAP | 40.8 | RetinaNet (ResNeXt-101-FPN) |
| 16k | COCO test-dev | AP50 | 59.1 | RetinaNet (ResNet-101-FPN) |
| 16k | COCO test-dev | AP75 | 42.3 | RetinaNet (ResNet-101-FPN) |
| 16k | COCO test-dev | APL | 50.2 | RetinaNet (ResNet-101-FPN) |
| 16k | COCO test-dev | APM | 42.7 | RetinaNet (ResNet-101-FPN) |
| 16k | COCO test-dev | APS | 21.8 | RetinaNet (ResNet-101-FPN) |
| 16k | COCO test-dev | box mAP | 39.1 | RetinaNet (ResNet-101-FPN) |
| 16k | COCO-O | Average mAP | 16.6 | RetinaNet (ResNet-50) |
| 16k | COCO-O | Effective Robustness | 0.18 | RetinaNet (ResNet-50) |
| 16k | SKU-110K | AP | 45.5 | RetinaNet |
| 16k | SKU-110K | AP75 | 0.389 | RetinaNet |