Shangliang Xu, Xinxin Wang, Wenyu Lv, Qinyao Chang, Cheng Cui, Kaipeng Deng, Guanzhong Wang, Qingqing Dang, Shengyu Wei, Yuning Du, Baohua Lai
In this report, we present PP-YOLOE, an industrial state-of-the-art object detector with high performance and friendly deployment. We optimize on the basis of the previous PP-YOLOv2, using anchor-free paradigm, more powerful backbone and neck equipped with CSPRepResStage, ET-head and dynamic label assignment algorithm TAL. We provide s/m/l/x models for different practice scenarios. As a result, PP-YOLOE-l achieves 51.4 mAP on COCO test-dev and 78.1 FPS on Tesla V100, yielding a remarkable improvement of (+1.9 AP, +13.35% speed up) and (+1.3 AP, +24.96% speed up), compared to the previous state-of-the-art industrial models PP-YOLOv2 and YOLOX respectively. Further, PP-YOLOE inference speed achieves 149.2 FPS with TensorRT and FP16-precision. We also conduct extensive experiments to verify the effectiveness of our designs. Source code and pre-trained models are available at https://github.com/PaddlePaddle/PaddleDetection.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | CroHD | MOTA | 72.6 | PP-Tracking |
| Multi-Object Tracking | MOT16 | MOTA | 77.7 | PPTracking |
| Object Tracking | MOT16 | MOTA | 77.7 | PPTracking |
| Object Tracking | CroHD | MOTA | 72.6 | PP-Tracking |
| Object Tracking | MOT16 | MOTA | 77.7 | PP-Tracking |
| Object Detection | COCO test-dev | AP50 | 69.9 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| Object Detection | COCO test-dev | AP75 | 56.5 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APL | 66.4 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APM | 56.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APS | 33.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| Object Detection | COCO test-dev | box mAP | 52.2 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| Object Detection | COCO test-dev | AP50 | 68.9 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| Object Detection | COCO test-dev | AP75 | 55.6 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APL | 66.1 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APM | 55.3 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APS | 31.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| Object Detection | COCO test-dev | box mAP | 51.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| Object Detection | COCO test-dev | AP50 | 66.5 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| Object Detection | COCO test-dev | AP75 | 53 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APL | 63.8 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APM | 52.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APS | 28.6 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| Object Detection | COCO test-dev | box mAP | 48.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| Object Detection | COCO test-dev | AP50 | 60.5 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| Object Detection | COCO test-dev | AP75 | 46.6 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APL | 56.9 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APM | 46.4 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| Object Detection | COCO test-dev | APS | 23.2 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| Object Detection | COCO test-dev | box mAP | 43.1 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 45 | PP-YOLOE+_X |
| Object Detection | COCO (Common Objects in Context) | box AP | 54.7 | PP-YOLOE+_X |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L(distillation) |
| Object Detection | COCO (Common Objects in Context) | box AP | 54 | PP-YOLOE+_L(distillation) |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L |
| Object Detection | COCO (Common Objects in Context) | box AP | 52.9 | PP-YOLOE+_L |
| Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 123 | YOLOv3 |
| Object Detection | COCO (Common Objects in Context) | box AP | 51 | YOLOv3 |
| Object Detection | COCO (Common Objects in Context) | box AP | 49.8 | PP-YOLOE+_M |
| 3D | COCO test-dev | AP50 | 69.9 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 3D | COCO test-dev | AP75 | 56.5 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 3D | COCO test-dev | APL | 66.4 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 3D | COCO test-dev | APM | 56.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 3D | COCO test-dev | APS | 33.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 3D | COCO test-dev | box mAP | 52.2 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 3D | COCO test-dev | AP50 | 68.9 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 3D | COCO test-dev | AP75 | 55.6 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 3D | COCO test-dev | APL | 66.1 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 3D | COCO test-dev | APM | 55.3 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 3D | COCO test-dev | APS | 31.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 3D | COCO test-dev | box mAP | 51.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 3D | COCO test-dev | AP50 | 66.5 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 3D | COCO test-dev | AP75 | 53 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 3D | COCO test-dev | APL | 63.8 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 3D | COCO test-dev | APM | 52.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 3D | COCO test-dev | APS | 28.6 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 3D | COCO test-dev | box mAP | 48.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 3D | COCO test-dev | AP50 | 60.5 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 3D | COCO test-dev | AP75 | 46.6 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 3D | COCO test-dev | APL | 56.9 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 3D | COCO test-dev | APM | 46.4 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 3D | COCO test-dev | APS | 23.2 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 3D | COCO test-dev | box mAP | 43.1 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 45 | PP-YOLOE+_X |
| 3D | COCO (Common Objects in Context) | box AP | 54.7 | PP-YOLOE+_X |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L(distillation) |
| 3D | COCO (Common Objects in Context) | box AP | 54 | PP-YOLOE+_L(distillation) |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L |
| 3D | COCO (Common Objects in Context) | box AP | 52.9 | PP-YOLOE+_L |
| 3D | COCO (Common Objects in Context) | FPS (V100, b=1) | 123 | YOLOv3 |
| 3D | COCO (Common Objects in Context) | box AP | 51 | YOLOv3 |
| 3D | COCO (Common Objects in Context) | box AP | 49.8 | PP-YOLOE+_M |
| Multiple Object Tracking | CroHD | MOTA | 72.6 | PP-Tracking |
| 2D Classification | COCO test-dev | AP50 | 69.9 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | AP75 | 56.5 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APL | 66.4 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APM | 56.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APS | 33.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | box mAP | 52.2 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | AP50 | 68.9 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | AP75 | 55.6 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APL | 66.1 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APM | 55.3 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APS | 31.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | box mAP | 51.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | AP50 | 66.5 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | AP75 | 53 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APL | 63.8 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APM | 52.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APS | 28.6 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | box mAP | 48.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | AP50 | 60.5 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | AP75 | 46.6 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APL | 56.9 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APM | 46.4 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | APS | 23.2 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Classification | COCO test-dev | box mAP | 43.1 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 45 | PP-YOLOE+_X |
| 2D Classification | COCO (Common Objects in Context) | box AP | 54.7 | PP-YOLOE+_X |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L(distillation) |
| 2D Classification | COCO (Common Objects in Context) | box AP | 54 | PP-YOLOE+_L(distillation) |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L |
| 2D Classification | COCO (Common Objects in Context) | box AP | 52.9 | PP-YOLOE+_L |
| 2D Classification | COCO (Common Objects in Context) | FPS (V100, b=1) | 123 | YOLOv3 |
| 2D Classification | COCO (Common Objects in Context) | box AP | 51 | YOLOv3 |
| 2D Classification | COCO (Common Objects in Context) | box AP | 49.8 | PP-YOLOE+_M |
| 2D Object Detection | COCO test-dev | AP50 | 69.9 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | AP75 | 56.5 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APL | 66.4 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APM | 56.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APS | 33.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | box mAP | 52.2 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | AP50 | 68.9 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | AP75 | 55.6 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APL | 66.1 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APM | 55.3 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APS | 31.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | box mAP | 51.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | AP50 | 66.5 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | AP75 | 53 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APL | 63.8 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APM | 52.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APS | 28.6 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | box mAP | 48.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | AP50 | 60.5 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | AP75 | 46.6 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APL | 56.9 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APM | 46.4 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | APS | 23.2 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Object Detection | COCO test-dev | box mAP | 43.1 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 45 | PP-YOLOE+_X |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 54.7 | PP-YOLOE+_X |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L(distillation) |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 54 | PP-YOLOE+_L(distillation) |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 52.9 | PP-YOLOE+_L |
| 2D Object Detection | COCO (Common Objects in Context) | FPS (V100, b=1) | 123 | YOLOv3 |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 51 | YOLOv3 |
| 2D Object Detection | COCO (Common Objects in Context) | box AP | 49.8 | PP-YOLOE+_M |
| 16k | COCO test-dev | AP50 | 69.9 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 16k | COCO test-dev | AP75 | 56.5 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 16k | COCO test-dev | APL | 66.4 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 16k | COCO test-dev | APM | 56.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 16k | COCO test-dev | APS | 33.3 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 16k | COCO test-dev | box mAP | 52.2 | PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale ) |
| 16k | COCO test-dev | AP50 | 68.9 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 16k | COCO test-dev | AP75 | 55.6 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 16k | COCO test-dev | APL | 66.1 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 16k | COCO test-dev | APM | 55.3 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 16k | COCO test-dev | APS | 31.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 16k | COCO test-dev | box mAP | 51.4 | PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale ) |
| 16k | COCO test-dev | AP50 | 66.5 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 16k | COCO test-dev | AP75 | 53 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 16k | COCO test-dev | APL | 63.8 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 16k | COCO test-dev | APM | 52.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 16k | COCO test-dev | APS | 28.6 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 16k | COCO test-dev | box mAP | 48.9 | PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale ) |
| 16k | COCO test-dev | AP50 | 60.5 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 16k | COCO test-dev | AP75 | 46.6 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 16k | COCO test-dev | APL | 56.9 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 16k | COCO test-dev | APM | 46.4 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 16k | COCO test-dev | APS | 23.2 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 16k | COCO test-dev | box mAP | 43.1 | PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale ) |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 45 | PP-YOLOE+_X |
| 16k | COCO (Common Objects in Context) | box AP | 54.7 | PP-YOLOE+_X |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L(distillation) |
| 16k | COCO (Common Objects in Context) | box AP | 54 | PP-YOLOE+_L(distillation) |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 78 | PP-YOLOE+_L |
| 16k | COCO (Common Objects in Context) | box AP | 52.9 | PP-YOLOE+_L |
| 16k | COCO (Common Objects in Context) | FPS (V100, b=1) | 123 | YOLOv3 |
| 16k | COCO (Common Objects in Context) | box AP | 51 | YOLOv3 |
| 16k | COCO (Common Objects in Context) | box AP | 49.8 | PP-YOLOE+_M |