Chien-Yao Wang, Alexey Bochkovskiy, Hong-Yuan Mark Liao
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights. Source code is released in https://github.com/WongKinYiu/yolov7.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Autonomous Vehicles | DVTOD | mAP | 77.8 | YOLOv7 (Thermal) |
| Autonomous Vehicles | DVTOD | mAP | 35.3 | YOLOv7 (Visible) |
| Object Detection | COCO test-dev | box mAP | 56.6 | YOLOv7-D6 (44 fps) |
| Object Detection | COCO test-dev | box mAP | 56 | YOLOv7-E6 (56 fps) |
| Object Detection | COCO test-dev | box mAP | 54.9 | YOLOv7-W6 (84 fps) |
| Object Detection | COCO test-dev | box mAP | 53.1 | YOLOv7-X (114 fps) |
| Object Detection | COCO test-dev | box mAP | 51.4 | YOLOv7 (161 fps) |
| Object Detection | COCO-O | Average mAP | 32 | YOLOv7-E6E |
| Object Detection | COCO-O | Effective Robustness | 6.42 | YOLOv7-E6E |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 36 | YOLOv7-E6E(1280) |
| Object Detection | COCO (Common Objects in Context) | box AP | 56.8 | YOLOv7-E6E(1280) |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 44 | YOLOv7-D6(1280) |
| Object Detection | COCO (Common Objects in Context) | box AP | 56.6 | YOLOv7-D6(1280) |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 56 | YOLOv7-E6(1280) |
| Object Detection | COCO (Common Objects in Context) | box AP | 56 | YOLOv7-E6(1280) |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 84 | YOLOv7-W6(1280) |
| Object Detection | COCO (Common Objects in Context) | box AP | 54.9 | YOLOv7-W6(1280) |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 114 | YOLOv7-X |
| Object Detection | COCO (Common Objects in Context) | box AP | 53.1 | YOLOv7-X |
| 3D | COCO test-dev | box mAP | 56.6 | YOLOv7-D6 (44 fps) |
| 3D | COCO test-dev | box mAP | 56 | YOLOv7-E6 (56 fps) |
| 3D | COCO test-dev | box mAP | 54.9 | YOLOv7-W6 (84 fps) |
| 3D | COCO test-dev | box mAP | 53.1 | YOLOv7-X (114 fps) |
| 3D | COCO test-dev | box mAP | 51.4 | YOLOv7 (161 fps) |
| 3D | COCO-O | Average mAP | 32 | YOLOv7-E6E |
| 3D | COCO-O | Effective Robustness | 6.42 | YOLOv7-E6E |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 36 | YOLOv7-E6E(1280) |
| 3D | COCO (Common Objects in Context) | box AP | 56.8 | YOLOv7-E6E(1280) |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 44 | YOLOv7-D6(1280) |
| 3D | COCO (Common Objects in Context) | box AP | 56.6 | YOLOv7-D6(1280) |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 56 | YOLOv7-E6(1280) |
| 3D | COCO (Common Objects in Context) | box AP | 56 | YOLOv7-E6(1280) |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 84 | YOLOv7-W6(1280) |
| 3D | COCO (Common Objects in Context) | box AP | 54.9 | YOLOv7-W6(1280) |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 114 | YOLOv7-X |
| 3D | COCO (Common Objects in Context) | box AP | 53.1 | YOLOv7-X |
| 2D Classification | COCO test-dev | box mAP | 56.6 | YOLOv7-D6 (44 fps) |
| 2D Classification | COCO test-dev | box mAP | 56 | YOLOv7-E6 (56 fps) |
| 2D Classification | COCO test-dev | box mAP | 54.9 | YOLOv7-W6 (84 fps) |
| 2D Classification | COCO test-dev | box mAP | 53.1 | YOLOv7-X (114 fps) |
| 2D Classification | COCO test-dev | box mAP | 51.4 | YOLOv7 (161 fps) |
| 2D Classification | COCO-O | Average mAP | 32 | YOLOv7-E6E |
| 2D Classification | COCO-O | Effective Robustness | 6.42 | YOLOv7-E6E |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 36 | YOLOv7-E6E(1280) |
| 2D Classification | COCO (Common Objects in Context) | box AP | 56.8 | YOLOv7-E6E(1280) |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 44 | YOLOv7-D6(1280) |
| 2D Classification | COCO (Common Objects in Context) | box AP | 56.6 | YOLOv7-D6(1280) |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 56 | YOLOv7-E6(1280) |
| 2D Classification | COCO (Common Objects in Context) | box AP | 56 | YOLOv7-E6(1280) |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 84 | YOLOv7-W6(1280) |
| 2D Classification | COCO (Common Objects in Context) | box AP | 54.9 | YOLOv7-W6(1280) |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 114 | YOLOv7-X |
| 2D Classification | COCO (Common Objects in Context) | box AP | 53.1 | YOLOv7-X |
| Pedestrian Detection | DVTOD | mAP | 77.8 | YOLOv7 (Thermal) |
| Pedestrian Detection | DVTOD | mAP | 35.3 | YOLOv7 (Visible) |
| 2D Object Detection | CeyMo | mAP | 69.5 | YOLOv7 |
| 2D Object Detection | COCO test-dev | box mAP | 56.6 | YOLOv7-D6 (44 fps) |
| 2D Object Detection | COCO test-dev | box mAP | 56 | YOLOv7-E6 (56 fps) |
| 2D Object Detection | COCO test-dev | box mAP | 54.9 | YOLOv7-W6 (84 fps) |
| 2D Object Detection | COCO test-dev | box mAP | 53.1 | YOLOv7-X (114 fps) |
| 2D Object Detection | COCO test-dev | box mAP | 51.4 | YOLOv7 (161 fps) |
| 2D Object Detection | COCO-O | Average mAP | 32 | YOLOv7-E6E |
| 2D Object Detection | COCO-O | Effective Robustness | 6.42 | YOLOv7-E6E |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 36 | YOLOv7-E6E(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 56.8 | YOLOv7-E6E(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 44 | YOLOv7-D6(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 56.6 | YOLOv7-D6(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 56 | YOLOv7-E6(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 56 | YOLOv7-E6(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 84 | YOLOv7-W6(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 54.9 | YOLOv7-W6(1280) |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 114 | YOLOv7-X |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 53.1 | YOLOv7-X |
| 16k | COCO test-dev | box mAP | 56.6 | YOLOv7-D6 (44 fps) |
| 16k | COCO test-dev | box mAP | 56 | YOLOv7-E6 (56 fps) |
| 16k | COCO test-dev | box mAP | 54.9 | YOLOv7-W6 (84 fps) |
| 16k | COCO test-dev | box mAP | 53.1 | YOLOv7-X (114 fps) |
| 16k | COCO test-dev | box mAP | 51.4 | YOLOv7 (161 fps) |
| 16k | COCO-O | Average mAP | 32 | YOLOv7-E6E |
| 16k | COCO-O | Effective Robustness | 6.42 | YOLOv7-E6E |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 36 | YOLOv7-E6E(1280) |
| 16k | COCO (Common Objects in Context) | box AP | 56.8 | YOLOv7-E6E(1280) |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 44 | YOLOv7-D6(1280) |
| 16k | COCO (Common Objects in Context) | box AP | 56.6 | YOLOv7-D6(1280) |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 56 | YOLOv7-E6(1280) |
| 16k | COCO (Common Objects in Context) | box AP | 56 | YOLOv7-E6(1280) |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 84 | YOLOv7-W6(1280) |
| 16k | COCO (Common Objects in Context) | box AP | 54.9 | YOLOv7-W6(1280) |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 114 | YOLOv7-X |
| 16k | COCO (Common Objects in Context) | box AP | 53.1 | YOLOv7-X |