TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/RetinaNet

RetinaNet

Computer VisionIntroduced 2000210 papers
Source Paper

Description

RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-shelf convolutional network. The first subnet performs convolutional object classification on the backbone's output; the second subnet performs convolutional bounding box regression. The two subnetworks feature a simple design that the authors propose specifically for one-stage, dense detection.

We can see the motivation for focal loss by comparing with two-stage object detectors. Here class imbalance is addressed by a two-stage cascade and sampling heuristics. The proposal stage (e.g., Selective Search, EdgeBoxes, DeepMask, RPN) rapidly narrows down the number of candidate object locations to a small number (e.g., 1-2k), filtering out most background samples. In the second classification stage, sampling heuristics, such as a fixed foreground-to-background ratio, or online hard example mining (OHEM), are performed to maintain a manageable balance between foreground and background.

In contrast, a one-stage detector must process a much larger set of candidate object locations regularly sampled across an image. To tackle this, RetinaNet uses a focal loss function, a dynamically scaled cross entropy loss, where the scaling factor decays to zero as confidence in the correct class increases. Intuitively, this scaling factor can automatically down-weight the contribution of easy examples during training and rapidly focus the model on hard examples.

Formally, the Focal Loss adds a factor (1−p_t)γ(1 - p\_{t})^\gamma(1−p_t)γ to the standard cross entropy criterion. Setting γ>0\gamma>0γ>0 reduces the relative loss for well-classified examples (p_t>.5p\_{t}>.5p_t>.5), putting more focus on hard, misclassified examples. Here there is tunable focusing parameter γ≥0\gamma \ge 0γ≥0.

FL(p_t)=−(1−p_t)γlog⁡(p_t){\text{FL}(p\_{t}) = - (1 - p\_{t})^\gamma \log\left(p\_{t}\right)}FL(p_t)=−(1−p_t)γlog(p_t)

Papers Using This Method

PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting2025-05-08Class Imbalance Correction for Improved Universal Lesion Detection and Tagging in CT2025-04-08Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving2025-02-11Dual Scale-aware Adaptive Masked Knowledge Distillation for Object Detection2025-01-13Detection of Body Packs in Abdominal CT scans Through Artificial Intelligence2024-12-26Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors2024-12-25EMOv2: Pushing 5M Vision Model Frontier2024-12-09Psych-Occlusion: Using Visual Psychophysics for Aerial Detection of Occluded Persons during Search and Rescue2024-12-07One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks2024-11-05Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness2024-09-25LithoHoD: A Litho Simulator-Powered Framework for IC Layout Hotspot Detection2024-09-16On Feasibility of Intent Obfuscating Attacks2024-07-22FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method2024-04-28FlightScope: An Experimental Comparative Review of Aircraft Detection Algorithms in Satellite Imagery2024-04-03Investigation of the Impact of Synthetic Training Data in the Industrial Application of Terminal Strip Object Detection2024-03-06A Safety-Adapted Loss for Pedestrian Detection in Automated Driving2024-02-05pLitterStreet: Street Level Plastic Litter Detection and Mapping2024-01-26DyRA: Portable Dynamic Resolution Adjustment Network for Existing Detectors2023-11-28P2RBox: Point Prompt Oriented Object Detection with SAM2023-11-22Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection2023-10-09