BlitzNet: A Real-Time Deep Network for Scene Understanding

Nikita Dvornik, Konstantin Shmelkov, Julien Mairal, Cordelia Schmid

2017-08-09ICCV 2017 10Real-Time Semantic Segmentation Scene Understanding Segmentation Real-Time Object Detection Autonomous Driving Semantic Segmentation object-detection Object Detection

Paper PDF Code Code

Abstract

Real-time scene understanding has become crucial in many applications such as autonomous driving. In this paper, we propose a deep architecture, called BlitzNet, that jointly performs object detection and semantic segmentation in one forward pass, allowing real-time computations. Besides the computational gain of having a single network to perform several tasks, we show that object detection and semantic segmentation benefit from each other in terms of accuracy. Experimental results for VOC and COCO datasets show state-of-the-art performance for object detection and segmentation among real time systems.

Results

Task	Dataset	Metric	Value	Model
Object Detection	PASCAL VOC 2007	FPS	24	BlitzNet512 (s4)
Object Detection	PASCAL VOC 2007	FPS	19.5	BlitzNet512 (s8)
3D	PASCAL VOC 2007	FPS	24	BlitzNet512 (s4)
3D	PASCAL VOC 2007	FPS	19.5	BlitzNet512 (s8)
2D Classification	PASCAL VOC 2007	FPS	24	BlitzNet512 (s4)
2D Classification	PASCAL VOC 2007	FPS	19.5	BlitzNet512 (s8)
2D Object Detection	PASCAL VOC 2007	FPS	24	BlitzNet512 (s4)
2D Object Detection	PASCAL VOC 2007	FPS	19.5	BlitzNet512 (s8)
16k	PASCAL VOC 2007	FPS	24	BlitzNet512 (s4)
16k	PASCAL VOC 2007	FPS	19.5	BlitzNet512 (s8)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19 AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18 Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17 DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17