You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi

2015-06-08CVPR 2016 6Real-Time Object Detection Object Counting Object Detection

Abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.

Results

Task	Dataset	Metric	Value	Model
Object Counting	CARPK	MAE	156	YOLO (2016)
Object Counting	CARPK	RMSE	57.55	YOLO (2016)
Object Detection	PASCAL VOC 2012	MAP	57.9	YOLO
Object Detection	PASCAL VOC 2007	FPS	46	YOLO
3D	PASCAL VOC 2012	MAP	57.9	YOLO
3D	PASCAL VOC 2007	FPS	46	YOLO
2D Classification	PASCAL VOC 2012	MAP	57.9	YOLO
2D Classification	PASCAL VOC 2007	FPS	46	YOLO
2D Object Detection	PASCAL VOC 2012	MAP	57.9	YOLO
2D Object Detection	PASCAL VOC 2007	FPS	46	YOLO
16k	PASCAL VOC 2012	MAP	57.9	YOLO
16k	PASCAL VOC 2007	FPS	46	YOLO

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17 Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17 Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15 Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11 ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08