TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rethinking Dilated Convolution for Real-time Semantic Segm...

Rethinking Dilated Convolution for Real-time Semantic Segmentation

Roland Gao

2021-11-18Real-Time Semantic SegmentationSemantic SegmentationNeural Architecture Search
PaperPDFCode(official)CodeCode

Abstract

The field-of-view is an important metric when designing a model for semantic segmentation. To obtain a large field-of-view, previous approaches generally choose to rapidly downsample the resolution, usually with average poolings or stride 2 convolutions. We take a different approach by using dilated convolutions with large dilation rates throughout the backbone, allowing the backbone to easily tune its field-of-view by adjusting its dilation rates, and show that it's competitive with existing approaches. To effectively use the dilated convolution, we show a simple upper bound on the dilation rate in order to not leave gaps in between the convolutional weights, and design an SE-ResNeXt inspired block structure that uses two parallel $3\times 3$ convolutions with different dilation rates to preserve the local details. Manually tuning the dilation rates for every block can be difficult, so we also introduce a differentiable neural architecture search method that uses gradient descent to optimize the dilation rates. In addition, we propose a lightweight decoder that restores local information better than common alternatives. To demonstrate the effectiveness of our approach, our model RegSeg achieves competitive results on real-time Cityscapes and CamVid datasets. Using a T4 GPU with mixed precision, RegSeg achieves 78.3 mIOU on Cityscapes test set at $37$ FPS, and 80.9 mIOU on CamVid test set at $112$ FPS, both without ImageNet pretraining.

Results

TaskDatasetMetricValueModel
Semantic SegmentationCityscapes testFrame (fps)30RegSeg (no ImageNet pretraining)
Semantic SegmentationCityscapes testTime (ms)33RegSeg (no ImageNet pretraining)
Semantic SegmentationCamVidFrame (fps)70RegSeg(Cityscapes-Pretrained)
Semantic SegmentationCamVidTime (ms)14RegSeg(Cityscapes-Pretrained)
Semantic SegmentationCamVidmIoU80.9RegSeg(Cityscapes-Pretrained)
10-shot image generationCityscapes testFrame (fps)30RegSeg (no ImageNet pretraining)
10-shot image generationCityscapes testTime (ms)33RegSeg (no ImageNet pretraining)
10-shot image generationCamVidFrame (fps)70RegSeg(Cityscapes-Pretrained)
10-shot image generationCamVidTime (ms)14RegSeg(Cityscapes-Pretrained)
10-shot image generationCamVidmIoU80.9RegSeg(Cityscapes-Pretrained)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15