YOLOv11: An Overview of the Key Architectural Enhancements

Rahima Khanam, Muhammad Hussain

2024-10-23Real-Time Object Detection Semantic Segmentation Pose Estimation Instance Segmentation Oriented Object Detection object-detection Object Detection

Paper PDF Code

Abstract

This study presents an architectural analysis of YOLOv11, the latest iteration in the YOLO (You Only Look Once) series of object detection models. We examine the models architectural innovations, including the introduction of the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components, which contribute in improving the models performance in several ways such as enhanced feature extraction. The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB). We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy. Additionally, the study discusses YOLOv11's versatility across different model sizes, from nano to extra-large, catering to diverse application needs from edge devices to high-performance computing environments. Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.

Results

Task	Dataset	Metric	Value	Model
Object Detection	COCO (Common Objects in Context)	box AP	54.7	YOLOv11x
Object Detection	COCO (Common Objects in Context)	box AP	53.4	YOLOv11l
Object Detection	COCO (Common Objects in Context)	box AP	51.5	YOLOv11m
Object Detection	COCO (Common Objects in Context)	box AP	47	YOLOv11s
Object Detection	COCO (Common Objects in Context)	box AP	39.5	YOLOv11n
3D	COCO (Common Objects in Context)	box AP	54.7	YOLOv11x
3D	COCO (Common Objects in Context)	box AP	53.4	YOLOv11l
3D	COCO (Common Objects in Context)	box AP	51.5	YOLOv11m
3D	COCO (Common Objects in Context)	box AP	47	YOLOv11s
3D	COCO (Common Objects in Context)	box AP	39.5	YOLOv11n
2D Classification	COCO (Common Objects in Context)	box AP	54.7	YOLOv11x
2D Classification	COCO (Common Objects in Context)	box AP	53.4	YOLOv11l
2D Classification	COCO (Common Objects in Context)	box AP	51.5	YOLOv11m
2D Classification	COCO (Common Objects in Context)	box AP	47	YOLOv11s
2D Classification	COCO (Common Objects in Context)	box AP	39.5	YOLOv11n
2D Object Detection	COCO (Common Objects in Context)	box AP	54.7	YOLOv11x
2D Object Detection	COCO (Common Objects in Context)	box AP	53.4	YOLOv11l
2D Object Detection	COCO (Common Objects in Context)	box AP	51.5	YOLOv11m
2D Object Detection	COCO (Common Objects in Context)	box AP	47	YOLOv11s
2D Object Detection	COCO (Common Objects in Context)	box AP	39.5	YOLOv11n
16k	COCO (Common Objects in Context)	box AP	54.7	YOLOv11x
16k	COCO (Common Objects in Context)	box AP	53.4	YOLOv11l
16k	COCO (Common Objects in Context)	box AP	51.5	YOLOv11m
16k	COCO (Common Objects in Context)	box AP	47	YOLOv11s
16k	COCO (Common Objects in Context)	box AP	39.5	YOLOv11n

YOLOv11: An Overview of the Key Architectural Enhancements

Abstract

Results

Related Papers

YOLOv11: An Overview of the Key Architectural Enhancements

Abstract

Results

Related Papers