Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, Jian Sun
In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoupled head and the leading label assignment strategy SimOTA to achieve state-of-the-art results across a large scale range of models: For YOLO-Nano with only 0.91M parameters and 1.08G FLOPs, we get 25.3% AP on COCO, surpassing NanoDet by 1.8% AP; for YOLOv3, one of the most widely used detectors in industry, we boost it to 47.3% AP on COCO, outperforming the current best practice by 3.0% AP; for YOLOX-L with roughly the same amount of parameters as YOLOv4-CSP, YOLOv5-L, we achieve 50.0% AP on COCO at a speed of 68.9 FPS on Tesla V100, exceeding YOLOv5-L by 1.8% AP. Further, we won the 1st Place on Streaming Perception Challenge (Workshop on Autonomous Driving at CVPR 2021) using a single YOLOX-L model. We hope this report can provide useful experience for developers and researchers in practical scenes, and we also provide deploy versions with ONNX, TensorRT, NCNN, and Openvino supported. Source code is at https://github.com/Megvii-BaseDetection/YOLOX.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | COCO test-dev | box mAP | 51.5 | YOLOX-x(Modified CSP v5, 640x640, single-scale) |
| Object Detection | COCO test-dev | AP50 | 69.6 | YOLOX-X (Modified CSP v5) |
| Object Detection | COCO test-dev | AP75 | 55.7 | YOLOX-X (Modified CSP v5) |
| Object Detection | COCO test-dev | APL | 66.1 | YOLOX-X (Modified CSP v5) |
| Object Detection | COCO test-dev | APM | 56.1 | YOLOX-X (Modified CSP v5) |
| Object Detection | COCO test-dev | APS | 31.2 | YOLOX-X (Modified CSP v5) |
| Object Detection | COCO test-dev | Params (M) | 99.1 | YOLOX-X (Modified CSP v5) |
| Object Detection | COCO test-dev | box mAP | 51.2 | YOLOX-X (Modified CSP v5) |
| Object Detection | COCO test-dev | box mAP | 48 | YOLOX-Darknet53(Darknet53, 640x640, single-scale) |
| Object Detection | COCO-O | Average mAP | 30.3 | YOLOX-X |
| Object Detection | COCO-O | Effective Robustness | 7.26 | YOLOX-X |
| Object Detection | COCO-O | Average mAP | 20.6 | YOLOX-S |
| Object Detection | COCO-O | Effective Robustness | 2.48 | YOLOX-S |
| Object Detection | WaterScenes | mAP@50-95 | 57.8 | YOLOX-M |
| Object Detection | Argoverse-HD (Full-Stack, Test) | AP | 41.1 | YOLOX |
| Object Detection | Argoverse-HD (Detection-Only, Test) | AP | 41.1 | YOLOX |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 62.5 | YOLOv5-X |
| Object Detection | COCO (Common Objects in Context) | box AP | 50.4 | YOLOv5-X |
| Object Detection | Argoverse-HD (Detection-Only, Val) | AP | 47.42 | YOLOX |
| 3D | COCO test-dev | box mAP | 51.5 | YOLOX-x(Modified CSP v5, 640x640, single-scale) |
| 3D | COCO test-dev | AP50 | 69.6 | YOLOX-X (Modified CSP v5) |
| 3D | COCO test-dev | AP75 | 55.7 | YOLOX-X (Modified CSP v5) |
| 3D | COCO test-dev | APL | 66.1 | YOLOX-X (Modified CSP v5) |
| 3D | COCO test-dev | APM | 56.1 | YOLOX-X (Modified CSP v5) |
| 3D | COCO test-dev | APS | 31.2 | YOLOX-X (Modified CSP v5) |
| 3D | COCO test-dev | Params (M) | 99.1 | YOLOX-X (Modified CSP v5) |
| 3D | COCO test-dev | box mAP | 51.2 | YOLOX-X (Modified CSP v5) |
| 3D | COCO test-dev | box mAP | 48 | YOLOX-Darknet53(Darknet53, 640x640, single-scale) |
| 3D | COCO-O | Average mAP | 30.3 | YOLOX-X |
| 3D | COCO-O | Effective Robustness | 7.26 | YOLOX-X |
| 3D | COCO-O | Average mAP | 20.6 | YOLOX-S |
| 3D | COCO-O | Effective Robustness | 2.48 | YOLOX-S |
| 3D | WaterScenes | mAP@50-95 | 57.8 | YOLOX-M |
| 3D | Argoverse-HD (Full-Stack, Test) | AP | 41.1 | YOLOX |
| 3D | Argoverse-HD (Detection-Only, Test) | AP | 41.1 | YOLOX |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 62.5 | YOLOv5-X |
| 3D | COCO (Common Objects in Context) | box AP | 50.4 | YOLOv5-X |
| 3D | Argoverse-HD (Detection-Only, Val) | AP | 47.42 | YOLOX |
| 2D Classification | COCO test-dev | box mAP | 51.5 | YOLOX-x(Modified CSP v5, 640x640, single-scale) |
| 2D Classification | COCO test-dev | AP50 | 69.6 | YOLOX-X (Modified CSP v5) |
| 2D Classification | COCO test-dev | AP75 | 55.7 | YOLOX-X (Modified CSP v5) |
| 2D Classification | COCO test-dev | APL | 66.1 | YOLOX-X (Modified CSP v5) |
| 2D Classification | COCO test-dev | APM | 56.1 | YOLOX-X (Modified CSP v5) |
| 2D Classification | COCO test-dev | APS | 31.2 | YOLOX-X (Modified CSP v5) |
| 2D Classification | COCO test-dev | Params (M) | 99.1 | YOLOX-X (Modified CSP v5) |
| 2D Classification | COCO test-dev | box mAP | 51.2 | YOLOX-X (Modified CSP v5) |
| 2D Classification | COCO test-dev | box mAP | 48 | YOLOX-Darknet53(Darknet53, 640x640, single-scale) |
| 2D Classification | COCO-O | Average mAP | 30.3 | YOLOX-X |
| 2D Classification | COCO-O | Effective Robustness | 7.26 | YOLOX-X |
| 2D Classification | COCO-O | Average mAP | 20.6 | YOLOX-S |
| 2D Classification | COCO-O | Effective Robustness | 2.48 | YOLOX-S |
| 2D Classification | WaterScenes | mAP@50-95 | 57.8 | YOLOX-M |
| 2D Classification | Argoverse-HD (Full-Stack, Test) | AP | 41.1 | YOLOX |
| 2D Classification | Argoverse-HD (Detection-Only, Test) | AP | 41.1 | YOLOX |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 62.5 | YOLOv5-X |
| 2D Classification | COCO (Common Objects in Context) | box AP | 50.4 | YOLOv5-X |
| 2D Classification | Argoverse-HD (Detection-Only, Val) | AP | 47.42 | YOLOX |
| 2D Object Detection | CeyMo | mAP | 57.7 | YOLOX |
| 2D Object Detection | COCO test-dev | box mAP | 51.5 | YOLOX-x(Modified CSP v5, 640x640, single-scale) |
| 2D Object Detection | COCO test-dev | AP50 | 69.6 | YOLOX-X (Modified CSP v5) |
| 2D Object Detection | COCO test-dev | AP75 | 55.7 | YOLOX-X (Modified CSP v5) |
| 2D Object Detection | COCO test-dev | APL | 66.1 | YOLOX-X (Modified CSP v5) |
| 2D Object Detection | COCO test-dev | APM | 56.1 | YOLOX-X (Modified CSP v5) |
| 2D Object Detection | COCO test-dev | APS | 31.2 | YOLOX-X (Modified CSP v5) |
| 2D Object Detection | COCO test-dev | Params (M) | 99.1 | YOLOX-X (Modified CSP v5) |
| 2D Object Detection | COCO test-dev | box mAP | 51.2 | YOLOX-X (Modified CSP v5) |
| 2D Object Detection | COCO test-dev | box mAP | 48 | YOLOX-Darknet53(Darknet53, 640x640, single-scale) |
| 2D Object Detection | COCO-O | Average mAP | 30.3 | YOLOX-X |
| 2D Object Detection | COCO-O | Effective Robustness | 7.26 | YOLOX-X |
| 2D Object Detection | COCO-O | Average mAP | 20.6 | YOLOX-S |
| 2D Object Detection | COCO-O | Effective Robustness | 2.48 | YOLOX-S |
| 2D Object Detection | WaterScenes | mAP@50-95 | 57.8 | YOLOX-M |
| 2D Object Detection | Argoverse-HD (Full-Stack, Test) | AP | 41.1 | YOLOX |
| 2D Object Detection | Argoverse-HD (Detection-Only, Test) | AP | 41.1 | YOLOX |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 62.5 | YOLOv5-X |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 50.4 | YOLOv5-X |
| 2D Object Detection | Argoverse-HD (Detection-Only, Val) | AP | 47.42 | YOLOX |
| 16k | COCO test-dev | box mAP | 51.5 | YOLOX-x(Modified CSP v5, 640x640, single-scale) |
| 16k | COCO test-dev | AP50 | 69.6 | YOLOX-X (Modified CSP v5) |
| 16k | COCO test-dev | AP75 | 55.7 | YOLOX-X (Modified CSP v5) |
| 16k | COCO test-dev | APL | 66.1 | YOLOX-X (Modified CSP v5) |
| 16k | COCO test-dev | APM | 56.1 | YOLOX-X (Modified CSP v5) |
| 16k | COCO test-dev | APS | 31.2 | YOLOX-X (Modified CSP v5) |
| 16k | COCO test-dev | Params (M) | 99.1 | YOLOX-X (Modified CSP v5) |
| 16k | COCO test-dev | box mAP | 51.2 | YOLOX-X (Modified CSP v5) |
| 16k | COCO test-dev | box mAP | 48 | YOLOX-Darknet53(Darknet53, 640x640, single-scale) |
| 16k | COCO-O | Average mAP | 30.3 | YOLOX-X |
| 16k | COCO-O | Effective Robustness | 7.26 | YOLOX-X |
| 16k | COCO-O | Average mAP | 20.6 | YOLOX-S |
| 16k | COCO-O | Effective Robustness | 2.48 | YOLOX-S |
| 16k | WaterScenes | mAP@50-95 | 57.8 | YOLOX-M |
| 16k | Argoverse-HD (Full-Stack, Test) | AP | 41.1 | YOLOX |
| 16k | Argoverse-HD (Detection-Only, Test) | AP | 41.1 | YOLOX |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 62.5 | YOLOv5-X |
| 16k | COCO (Common Objects in Context) | box AP | 50.4 | YOLOv5-X |
| 16k | Argoverse-HD (Detection-Only, Val) | AP | 47.42 | YOLOX |