TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg

2015-12-08Visual Object TrackingSurgical tool detectionNode Property PredictionObject DetectionLIDAR Semantic SegmentationLow-Light Image Enhancement
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode

Abstract

We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For $300\times 300$ input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for $500\times 500$ input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model. Code is available at https://github.com/weiliu89/caffe/tree/ssd .

Results

TaskDatasetMetricValueModel
Object DetectionCOCO-OAverage mAP13.6SSD (VGG-16)
Object DetectionCOCO-OEffective Robustness0.36SSD (VGG-16)
Object DetectionPKU-DDD17-Car mAP5073.1SSD
Object DetectionPASCAL VOC 2012MAP80SSD512 (07+12+COCO)
3DCOCO-OAverage mAP13.6SSD (VGG-16)
3DCOCO-OEffective Robustness0.36SSD (VGG-16)
3DPKU-DDD17-Car mAP5073.1SSD
3DPASCAL VOC 2012MAP80SSD512 (07+12+COCO)
2D ClassificationCOCO-OAverage mAP13.6SSD (VGG-16)
2D ClassificationCOCO-OEffective Robustness0.36SSD (VGG-16)
2D ClassificationPKU-DDD17-Car mAP5073.1SSD
2D ClassificationPASCAL VOC 2012MAP80SSD512 (07+12+COCO)
2D Object DetectionCOCO-OAverage mAP13.6SSD (VGG-16)
2D Object DetectionCOCO-OEffective Robustness0.36SSD (VGG-16)
2D Object DetectionPKU-DDD17-Car mAP5073.1SSD
2D Object DetectionPASCAL VOC 2012MAP80SSD512 (07+12+COCO)
16kCOCO-OAverage mAP13.6SSD (VGG-16)
16kCOCO-OEffective Robustness0.36SSD (VGG-16)
16kPKU-DDD17-Car mAP5073.1SSD
16kPASCAL VOC 2012MAP80SSD512 (07+12+COCO)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15HVI-CIDNet+: Beyond Extreme Darkness for Low-Light Image Enhancement2025-07-09ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08