YOLO-Former: YOLO Shakes Hand With ViT

Javad Khoramdel, Ahmad Moori, Yasamin Borhani, Armin Ghanbarzadeh, Esmaeil Najafi

2024-01-11object-detection Object Detection

Abstract

The proposed YOLO-Former method seamlessly integrates the ideas of transformer and YOLOv4 to create a highly accurate and efficient object detection system. The method leverages the fast inference speed of YOLOv4 and incorporates the advantages of the transformer architecture through the integration of convolutional attention and transformer modules. The results demonstrate the effectiveness of the proposed approach, with a mean average precision (mAP) of 85.76\% on the Pascal VOC dataset, while maintaining high prediction speed with a frame rate of 10.85 frames per second. The contribution of this work lies in the demonstration of how the innovative combination of these two state-of-the-art techniques can lead to further improvements in the field of object detection.

Results

Task	Dataset	Metric	Value	Model
Object Detection	PASCAL VOC 2012	MAP	86.01	YOLO-Former
3D	PASCAL VOC 2012	MAP	86.01	YOLO-Former
2D Classification	PASCAL VOC 2012	MAP	86.01	YOLO-Former
2D Object Detection	PASCAL VOC 2012	MAP	86.01	YOLO-Former
16k	PASCAL VOC 2012	MAP	86.01	YOLO-Former

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17 Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17 Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15 ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08 Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07