A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, Nuno Vasconcelos

2016-07-25Feature Upsampling Real-Time Object Detection Pedestrian Detection object-detection Object Detection Face Detection

Paper PDF Code(official)

Abstract

A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	WIDER Face (Hard)	AP	0.809	MSCNN
Autonomous Vehicles	Caltech	Reasonable Miss Rate	9.95	MS-CNN
Face Detection	WIDER Face (Hard)	AP	0.809	MSCNN
Face Reconstruction	WIDER Face (Hard)	AP	0.809	MSCNN
3D	WIDER Face (Hard)	AP	0.809	MSCNN
3D Face Modelling	WIDER Face (Hard)	AP	0.809	MSCNN
3D Face Reconstruction	WIDER Face (Hard)	AP	0.809	MSCNN
Pedestrian Detection	Caltech	Reasonable Miss Rate	9.95	MS-CNN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17 Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17 Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17 Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16 Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15 ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08 YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries2025-07-07