RetinaNet

Computer VisionIntroduced 2000210 papers

Description

RetinaNet is a one-stage object detection model that utilizes a focal loss function to address class imbalance during training. Focal loss applies a modulating term to the cross entropy loss in order to focus learning on hard negative examples. RetinaNet is a single, unified network composed of a backbone network and two task-specific subnetworks. The backbone is responsible for computing a convolutional feature map over an entire input image and is an off-the-shelf convolutional network. The first subnet performs convolutional object classification on the backbone's output; the second subnet performs convolutional bounding box regression. The two subnetworks feature a simple design that the authors propose specifically for one-stage, dense detection.

We can see the motivation for focal loss by comparing with two-stage object detectors. Here class imbalance is addressed by a two-stage cascade and sampling heuristics. The proposal stage (e.g., Selective Search, EdgeBoxes, DeepMask, RPN) rapidly narrows down the number of candidate object locations to a small number (e.g., 1-2k), filtering out most background samples. In the second classification stage, sampling heuristics, such as a fixed foreground-to-background ratio, or online hard example mining (OHEM), are performed to maintain a manageable balance between foreground and background.

In contrast, a one-stage detector must process a much larger set of candidate object locations regularly sampled across an image. To tackle this, RetinaNet uses a focal loss function, a dynamically scaled cross entropy loss, where the scaling factor decays to zero as confidence in the correct class increases. Intuitively, this scaling factor can automatically down-weight the contribution of easy examples during training and rapidly focus the model on hard examples.

Formally, the Focal Loss adds a factor $(1 - p\_{t})^\gamma$ to the standard cross entropy criterion. Setting $\gamma>0$ reduces the relative loss for well-classified examples ( $p\_{t}>.5$ ), putting more focus on hard, misclassified examples. Here there is tunable focusing parameter $\gamma \ge 0$ .

${\text{FL}(p\_{t}) = - (1 - p\_{t})^\gamma \log\left(p\_{t}\right)}$

Papers Using This Method

PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle Lighting2025-05-08 Class Imbalance Correction for Improved Universal Lesion Detection and Tagging in CT2025-04-08 Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving2025-02-11 Dual Scale-aware Adaptive Masked Knowledge Distillation for Object Detection2025-01-13 Detection of Body Packs in Abdominal CT scans Through Artificial Intelligence2024-12-26 Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors2024-12-25 EMOv2: Pushing 5M Vision Model Frontier2024-12-09 Psych-Occlusion: Using Visual Psychophysics for Aerial Detection of Occluded Persons during Search and Rescue2024-12-07 One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks2024-11-05 Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness2024-09-25 LithoHoD: A Litho Simulator-Powered Framework for IC Layout Hotspot Detection2024-09-16 On Feasibility of Intent Obfuscating Attacks2024-07-22 FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method2024-04-28 FlightScope: An Experimental Comparative Review of Aircraft Detection Algorithms in Satellite Imagery2024-04-03 Investigation of the Impact of Synthetic Training Data in the Industrial Application of Terminal Strip Object Detection2024-03-06 A Safety-Adapted Loss for Pedestrian Detection in Automated Driving2024-02-05 pLitterStreet: Street Level Plastic Litter Detection and Mapping2024-01-26 DyRA: Portable Dynamic Resolution Adjustment Network for Existing Detectors2023-11-28 P2RBox: Point Prompt Oriented Object Detection with SAM2023-11-22 Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection2023-10-09

Description

${\text{FL}(p\_{t}) = - (1 - p\_{t})^\gamma \log\left(p\_{t}\right)}$