Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam

2018-02-07ECCV 2018 9Image Classification Lesion Segmentation Semantic Segmentation Image Segmentation

Abstract

Spatial pyramid pooling module or encode-decoder structure are used in deep neural networks for semantic segmentation task. The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information. In this work, we propose to combine the advantages from both methods. Specifically, our proposed model, DeepLabv3+, extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further explore the Xception model and apply the depthwise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules, resulting in a faster and stronger encoder-decoder network. We demonstrate the effectiveness of the proposed model on PASCAL VOC 2012 and Cityscapes datasets, achieving the test set performance of 89.0\% and 82.1\% without any post-processing. Our paper is accompanied with a publicly available reference implementation of the proposed models in Tensorflow at \url{https://github.com/tensorflow/models/tree/master/research/deeplab}.

Results

Task	Dataset	Metric	Value	Model
Medical Image Segmentation	Anatomical Tracings of Lesions After Stroke (ATLAS)	Dice	0.4609	DeepLab v3+
Medical Image Segmentation	Anatomical Tracings of Lesions After Stroke (ATLAS)	IoU	0.3458	DeepLab v3+
Medical Image Segmentation	Anatomical Tracings of Lesions After Stroke (ATLAS)	Precision	0.5831	DeepLab v3+
Semantic Segmentation	US3D	mIoU	74.42	DeepLabV3+
Semantic Segmentation	Fine-Grained Grass Segmentation Dataset	mIoU	47.95	DeepLabv3+
Semantic Segmentation	Potsdam	mIoU	83.67	DeepLabV3+
Semantic Segmentation	Cityscapes val	mIoU	79.6	DeepLabv3+ (Dilated-Xception-71)
Semantic Segmentation	BDD100K val	mIoU	63.6	Deeplabv3+
Semantic Segmentation	UrbanLF	mIoU (Real)	76.27	DeepLabV3+ (ResNet-101)
Semantic Segmentation	SkyScapes-Dense	Mean IoU	38.2	DeepLabv3+
Semantic Segmentation	AI-TOD	Dice	43.52	DeepLabV3+(ResNet-50)
Semantic Segmentation	PASCAL VOC 2012 val	mIoU (Syn)	75.39	DeepLabV3+ (ResNet-101)
Semantic Segmentation	EventScape	mIoU	53.65	DeepLabV3+
Semantic Segmentation	Vaihingen	mIoU	72.9	DeepLabV3+
Semantic Segmentation	BJRoad	IoU	50.81	DeepLabv3+
Semantic Segmentation	Trans10K	GFLOPs	37.98	DeepLabV3+
Semantic Segmentation	DADA-seg	mIoU	26.8	DeepLabV3+ (ACDC)
10-shot image generation	US3D	mIoU	74.42	DeepLabV3+
10-shot image generation	Fine-Grained Grass Segmentation Dataset	mIoU	47.95	DeepLabv3+
10-shot image generation	Potsdam	mIoU	83.67	DeepLabV3+
10-shot image generation	Cityscapes val	mIoU	79.6	DeepLabv3+ (Dilated-Xception-71)
10-shot image generation	BDD100K val	mIoU	63.6	Deeplabv3+
10-shot image generation	UrbanLF	mIoU (Real)	76.27	DeepLabV3+ (ResNet-101)
10-shot image generation	SkyScapes-Dense	Mean IoU	38.2	DeepLabv3+
10-shot image generation	AI-TOD	Dice	43.52	DeepLabV3+(ResNet-50)
10-shot image generation	PASCAL VOC 2012 val	mIoU (Syn)	75.39	DeepLabV3+ (ResNet-101)
10-shot image generation	EventScape	mIoU	53.65	DeepLabV3+
10-shot image generation	Vaihingen	mIoU	72.9	DeepLabV3+
10-shot image generation	BJRoad	IoU	50.81	DeepLabv3+
10-shot image generation	Trans10K	GFLOPs	37.98	DeepLabV3+
10-shot image generation	DADA-seg	mIoU	26.8	DeepLabV3+ (ACDC)

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Abstract

Results

Related Papers

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Abstract

Results

Related Papers