DAMO-YOLO : A Report on Real-Time Object Detection Design

Xianzhe Xu, Yiqi Jiang, Weihua Chen, Yilun Huang, Yuan Zhang, Xiuyu Sun

2022-11-23Real-Time Object Detection Neural Architecture Search object-detection Object Detection

Paper PDF Code Code(official)Code(official)

Abstract

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet/CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of ``large neck, small head''.We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results.In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L. They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of 2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight models have outperformed other YOLO series models in their respective application scenarios.

Results

Task	Dataset	Metric	Value	Model
Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	126	DAMO-YOLO-L
Object Detection	COCO (Common Objects in Context)	box AP	50.8	DAMO-YOLO-L
Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	233	DAMO-YOLO-M
Object Detection	COCO (Common Objects in Context)	box AP	49.2	DAMO-YOLO-M
Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	325	DAMO-YOLO-S
Object Detection	COCO (Common Objects in Context)	box AP	46	DAMO-YOLO-S
Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	397	DAMO-YOLO-T
Object Detection	COCO (Common Objects in Context)	box AP	42	DAMO-YOLO-T
3D	COCO (Common Objects in Context)	FPS (V100, b=1)	126	DAMO-YOLO-L
3D	COCO (Common Objects in Context)	box AP	50.8	DAMO-YOLO-L
3D	COCO (Common Objects in Context)	FPS (V100, b=1)	233	DAMO-YOLO-M
3D	COCO (Common Objects in Context)	box AP	49.2	DAMO-YOLO-M
3D	COCO (Common Objects in Context)	FPS (V100, b=1)	325	DAMO-YOLO-S
3D	COCO (Common Objects in Context)	box AP	46	DAMO-YOLO-S
3D	COCO (Common Objects in Context)	FPS (V100, b=1)	397	DAMO-YOLO-T
3D	COCO (Common Objects in Context)	box AP	42	DAMO-YOLO-T
2D Classification	COCO (Common Objects in Context)	FPS (V100, b=1)	126	DAMO-YOLO-L
2D Classification	COCO (Common Objects in Context)	box AP	50.8	DAMO-YOLO-L
2D Classification	COCO (Common Objects in Context)	FPS (V100, b=1)	233	DAMO-YOLO-M
2D Classification	COCO (Common Objects in Context)	box AP	49.2	DAMO-YOLO-M
2D Classification	COCO (Common Objects in Context)	FPS (V100, b=1)	325	DAMO-YOLO-S
2D Classification	COCO (Common Objects in Context)	box AP	46	DAMO-YOLO-S
2D Classification	COCO (Common Objects in Context)	FPS (V100, b=1)	397	DAMO-YOLO-T
2D Classification	COCO (Common Objects in Context)	box AP	42	DAMO-YOLO-T
2D Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	126	DAMO-YOLO-L
2D Object Detection	COCO (Common Objects in Context)	box AP	50.8	DAMO-YOLO-L
2D Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	233	DAMO-YOLO-M
2D Object Detection	COCO (Common Objects in Context)	box AP	49.2	DAMO-YOLO-M
2D Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	325	DAMO-YOLO-S
2D Object Detection	COCO (Common Objects in Context)	box AP	46	DAMO-YOLO-S
2D Object Detection	COCO (Common Objects in Context)	FPS (V100, b=1)	397	DAMO-YOLO-T
2D Object Detection	COCO (Common Objects in Context)	box AP	42	DAMO-YOLO-T
16k	COCO (Common Objects in Context)	FPS (V100, b=1)	126	DAMO-YOLO-L
16k	COCO (Common Objects in Context)	box AP	50.8	DAMO-YOLO-L
16k	COCO (Common Objects in Context)	FPS (V100, b=1)	233	DAMO-YOLO-M
16k	COCO (Common Objects in Context)	box AP	49.2	DAMO-YOLO-M
16k	COCO (Common Objects in Context)	FPS (V100, b=1)	325	DAMO-YOLO-S
16k	COCO (Common Objects in Context)	box AP	46	DAMO-YOLO-S
16k	COCO (Common Objects in Context)	FPS (V100, b=1)	397	DAMO-YOLO-T
16k	COCO (Common Objects in Context)	box AP	42	DAMO-YOLO-T

DAMO-YOLO : A Report on Real-Time Object Detection Design

Abstract

Results

Related Papers

DAMO-YOLO : A Report on Real-Time Object Detection Design

Abstract

Results

Related Papers