Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia
We focus on the challenging task of real-time semantic segmentation in this paper. It finds many practical applications and yet is with fundamental difficulty of reducing a large portion of computation for pixel-wise label inference. We propose an image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. We provide in-depth analysis of our framework and introduce the cascade feature fusion unit to quickly achieve high-quality segmentation. Our system yields real-time inference on a single GPU card with decent quality results evaluated on challenging datasets like Cityscapes, CamVid and COCO-Stuff.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | Trans10K | GFLOPs | 10.64 | ICNet |
| Semantic Segmentation | Cityscapes test | Frame (fps) | 30.3 | ICNet |
| Semantic Segmentation | Cityscapes test | Time (ms) | 33 | ICNet |
| Semantic Segmentation | CamVid | Frame (fps) | 27.8 | ICNet |
| Semantic Segmentation | CamVid | Time (ms) | 36 | ICNet |
| Object Detection | DIS-TE4 | E-measure | 0.837 | ICNet |
| Object Detection | DIS-TE4 | HCE | 3690 | ICNet |
| Object Detection | DIS-TE4 | MAE | 0.099 | ICNet |
| Object Detection | DIS-TE4 | S-Measure | 0.776 | ICNet |
| Object Detection | DIS-TE4 | max F-Measure | 0.749 | ICNet |
| Object Detection | DIS-TE4 | weighted F-measure | 0.663 | ICNet |
| Object Detection | DIS-VD | E-measure | 0.811 | ICNet |
| Object Detection | DIS-VD | HCE | 1503 | ICNet |
| Object Detection | DIS-VD | MAE | 0.102 | ICNet |
| Object Detection | DIS-VD | S-Measure | 0.747 | ICNet |
| Object Detection | DIS-VD | max F-Measure | 0.697 | ICNet |
| Object Detection | DIS-VD | weighted F-measure | 0.609 | ICNet |
| Object Detection | DIS-TE2 | E-measure | 0.826 | ICNet |
| Object Detection | DIS-TE2 | HCE | 512 | ICNet |
| Object Detection | DIS-TE2 | MAE | 0.095 | ICNet |
| Object Detection | DIS-TE2 | S-Measure | 0.759 | ICNet |
| Object Detection | DIS-TE2 | max F-Measure | 0.716 | ICNet |
| Object Detection | DIS-TE2 | weighted F-measure | 0.627 | ICNet |
| Object Detection | DIS-TE1 | E-measure | 0.784 | ICNet |
| Object Detection | DIS-TE1 | HCE | 234 | ICNet |
| Object Detection | DIS-TE1 | MAE | 0.095 | ICNet |
| Object Detection | DIS-TE1 | S-Measure | 0.716 | ICNet |
| Object Detection | DIS-TE1 | max F-Measure | 0.631 | ICNet |
| Object Detection | DIS-TE1 | weighted F-measure | 0.535 | ICNet |
| Object Detection | DIS-TE3 | E-measure | 0.852 | ICNet |
| Object Detection | DIS-TE3 | HCE | 1001 | ICNet |
| Object Detection | DIS-TE3 | MAE | 0.091 | ICNet |
| Object Detection | DIS-TE3 | S-Measure | 0.78 | ICNet |
| Object Detection | DIS-TE3 | max F-Measure | 0.752 | ICNet |
| Object Detection | DIS-TE3 | weighted F-measure | 0.664 | ICNet |
| 3D | DIS-TE4 | E-measure | 0.837 | ICNet |
| 3D | DIS-TE4 | HCE | 3690 | ICNet |
| 3D | DIS-TE4 | MAE | 0.099 | ICNet |
| 3D | DIS-TE4 | S-Measure | 0.776 | ICNet |
| 3D | DIS-TE4 | max F-Measure | 0.749 | ICNet |
| 3D | DIS-TE4 | weighted F-measure | 0.663 | ICNet |
| 3D | DIS-VD | E-measure | 0.811 | ICNet |
| 3D | DIS-VD | HCE | 1503 | ICNet |
| 3D | DIS-VD | MAE | 0.102 | ICNet |
| 3D | DIS-VD | S-Measure | 0.747 | ICNet |
| 3D | DIS-VD | max F-Measure | 0.697 | ICNet |
| 3D | DIS-VD | weighted F-measure | 0.609 | ICNet |
| 3D | DIS-TE2 | E-measure | 0.826 | ICNet |
| 3D | DIS-TE2 | HCE | 512 | ICNet |
| 3D | DIS-TE2 | MAE | 0.095 | ICNet |
| 3D | DIS-TE2 | S-Measure | 0.759 | ICNet |
| 3D | DIS-TE2 | max F-Measure | 0.716 | ICNet |
| 3D | DIS-TE2 | weighted F-measure | 0.627 | ICNet |
| 3D | DIS-TE1 | E-measure | 0.784 | ICNet |
| 3D | DIS-TE1 | HCE | 234 | ICNet |
| 3D | DIS-TE1 | MAE | 0.095 | ICNet |
| 3D | DIS-TE1 | S-Measure | 0.716 | ICNet |
| 3D | DIS-TE1 | max F-Measure | 0.631 | ICNet |
| 3D | DIS-TE1 | weighted F-measure | 0.535 | ICNet |
| 3D | DIS-TE3 | E-measure | 0.852 | ICNet |
| 3D | DIS-TE3 | HCE | 1001 | ICNet |
| 3D | DIS-TE3 | MAE | 0.091 | ICNet |
| 3D | DIS-TE3 | S-Measure | 0.78 | ICNet |
| 3D | DIS-TE3 | max F-Measure | 0.752 | ICNet |
| 3D | DIS-TE3 | weighted F-measure | 0.664 | ICNet |
| RGB Salient Object Detection | DIS-TE4 | E-measure | 0.837 | ICNet |
| RGB Salient Object Detection | DIS-TE4 | HCE | 3690 | ICNet |
| RGB Salient Object Detection | DIS-TE4 | MAE | 0.099 | ICNet |
| RGB Salient Object Detection | DIS-TE4 | S-Measure | 0.776 | ICNet |
| RGB Salient Object Detection | DIS-TE4 | max F-Measure | 0.749 | ICNet |
| RGB Salient Object Detection | DIS-TE4 | weighted F-measure | 0.663 | ICNet |
| RGB Salient Object Detection | DIS-VD | E-measure | 0.811 | ICNet |
| RGB Salient Object Detection | DIS-VD | HCE | 1503 | ICNet |
| RGB Salient Object Detection | DIS-VD | MAE | 0.102 | ICNet |
| RGB Salient Object Detection | DIS-VD | S-Measure | 0.747 | ICNet |
| RGB Salient Object Detection | DIS-VD | max F-Measure | 0.697 | ICNet |
| RGB Salient Object Detection | DIS-VD | weighted F-measure | 0.609 | ICNet |
| RGB Salient Object Detection | DIS-TE2 | E-measure | 0.826 | ICNet |
| RGB Salient Object Detection | DIS-TE2 | HCE | 512 | ICNet |
| RGB Salient Object Detection | DIS-TE2 | MAE | 0.095 | ICNet |
| RGB Salient Object Detection | DIS-TE2 | S-Measure | 0.759 | ICNet |
| RGB Salient Object Detection | DIS-TE2 | max F-Measure | 0.716 | ICNet |
| RGB Salient Object Detection | DIS-TE2 | weighted F-measure | 0.627 | ICNet |
| RGB Salient Object Detection | DIS-TE1 | E-measure | 0.784 | ICNet |
| RGB Salient Object Detection | DIS-TE1 | HCE | 234 | ICNet |
| RGB Salient Object Detection | DIS-TE1 | MAE | 0.095 | ICNet |
| RGB Salient Object Detection | DIS-TE1 | S-Measure | 0.716 | ICNet |
| RGB Salient Object Detection | DIS-TE1 | max F-Measure | 0.631 | ICNet |
| RGB Salient Object Detection | DIS-TE1 | weighted F-measure | 0.535 | ICNet |
| RGB Salient Object Detection | DIS-TE3 | E-measure | 0.852 | ICNet |
| RGB Salient Object Detection | DIS-TE3 | HCE | 1001 | ICNet |
| RGB Salient Object Detection | DIS-TE3 | MAE | 0.091 | ICNet |
| RGB Salient Object Detection | DIS-TE3 | S-Measure | 0.78 | ICNet |
| RGB Salient Object Detection | DIS-TE3 | max F-Measure | 0.752 | ICNet |
| RGB Salient Object Detection | DIS-TE3 | weighted F-measure | 0.664 | ICNet |
| 2D Classification | DIS-TE4 | E-measure | 0.837 | ICNet |
| 2D Classification | DIS-TE4 | HCE | 3690 | ICNet |
| 2D Classification | DIS-TE4 | MAE | 0.099 | ICNet |
| 2D Classification | DIS-TE4 | S-Measure | 0.776 | ICNet |
| 2D Classification | DIS-TE4 | max F-Measure | 0.749 | ICNet |
| 2D Classification | DIS-TE4 | weighted F-measure | 0.663 | ICNet |
| 2D Classification | DIS-VD | E-measure | 0.811 | ICNet |
| 2D Classification | DIS-VD | HCE | 1503 | ICNet |
| 2D Classification | DIS-VD | MAE | 0.102 | ICNet |
| 2D Classification | DIS-VD | S-Measure | 0.747 | ICNet |
| 2D Classification | DIS-VD | max F-Measure | 0.697 | ICNet |
| 2D Classification | DIS-VD | weighted F-measure | 0.609 | ICNet |
| 2D Classification | DIS-TE2 | E-measure | 0.826 | ICNet |
| 2D Classification | DIS-TE2 | HCE | 512 | ICNet |
| 2D Classification | DIS-TE2 | MAE | 0.095 | ICNet |
| 2D Classification | DIS-TE2 | S-Measure | 0.759 | ICNet |
| 2D Classification | DIS-TE2 | max F-Measure | 0.716 | ICNet |
| 2D Classification | DIS-TE2 | weighted F-measure | 0.627 | ICNet |
| 2D Classification | DIS-TE1 | E-measure | 0.784 | ICNet |
| 2D Classification | DIS-TE1 | HCE | 234 | ICNet |
| 2D Classification | DIS-TE1 | MAE | 0.095 | ICNet |
| 2D Classification | DIS-TE1 | S-Measure | 0.716 | ICNet |
| 2D Classification | DIS-TE1 | max F-Measure | 0.631 | ICNet |
| 2D Classification | DIS-TE1 | weighted F-measure | 0.535 | ICNet |
| 2D Classification | DIS-TE3 | E-measure | 0.852 | ICNet |
| 2D Classification | DIS-TE3 | HCE | 1001 | ICNet |
| 2D Classification | DIS-TE3 | MAE | 0.091 | ICNet |
| 2D Classification | DIS-TE3 | S-Measure | 0.78 | ICNet |
| 2D Classification | DIS-TE3 | max F-Measure | 0.752 | ICNet |
| 2D Classification | DIS-TE3 | weighted F-measure | 0.664 | ICNet |
| 2D Object Detection | DIS-TE4 | E-measure | 0.837 | ICNet |
| 2D Object Detection | DIS-TE4 | HCE | 3690 | ICNet |
| 2D Object Detection | DIS-TE4 | MAE | 0.099 | ICNet |
| 2D Object Detection | DIS-TE4 | S-Measure | 0.776 | ICNet |
| 2D Object Detection | DIS-TE4 | max F-Measure | 0.749 | ICNet |
| 2D Object Detection | DIS-TE4 | weighted F-measure | 0.663 | ICNet |
| 2D Object Detection | DIS-VD | E-measure | 0.811 | ICNet |
| 2D Object Detection | DIS-VD | HCE | 1503 | ICNet |
| 2D Object Detection | DIS-VD | MAE | 0.102 | ICNet |
| 2D Object Detection | DIS-VD | S-Measure | 0.747 | ICNet |
| 2D Object Detection | DIS-VD | max F-Measure | 0.697 | ICNet |
| 2D Object Detection | DIS-VD | weighted F-measure | 0.609 | ICNet |
| 2D Object Detection | DIS-TE2 | E-measure | 0.826 | ICNet |
| 2D Object Detection | DIS-TE2 | HCE | 512 | ICNet |
| 2D Object Detection | DIS-TE2 | MAE | 0.095 | ICNet |
| 2D Object Detection | DIS-TE2 | S-Measure | 0.759 | ICNet |
| 2D Object Detection | DIS-TE2 | max F-Measure | 0.716 | ICNet |
| 2D Object Detection | DIS-TE2 | weighted F-measure | 0.627 | ICNet |
| 2D Object Detection | DIS-TE1 | E-measure | 0.784 | ICNet |
| 2D Object Detection | DIS-TE1 | HCE | 234 | ICNet |
| 2D Object Detection | DIS-TE1 | MAE | 0.095 | ICNet |
| 2D Object Detection | DIS-TE1 | S-Measure | 0.716 | ICNet |
| 2D Object Detection | DIS-TE1 | max F-Measure | 0.631 | ICNet |
| 2D Object Detection | DIS-TE1 | weighted F-measure | 0.535 | ICNet |
| 2D Object Detection | DIS-TE3 | E-measure | 0.852 | ICNet |
| 2D Object Detection | DIS-TE3 | HCE | 1001 | ICNet |
| 2D Object Detection | DIS-TE3 | MAE | 0.091 | ICNet |
| 2D Object Detection | DIS-TE3 | S-Measure | 0.78 | ICNet |
| 2D Object Detection | DIS-TE3 | max F-Measure | 0.752 | ICNet |
| 2D Object Detection | DIS-TE3 | weighted F-measure | 0.664 | ICNet |
| 10-shot image generation | Trans10K | GFLOPs | 10.64 | ICNet |
| 10-shot image generation | Cityscapes test | Frame (fps) | 30.3 | ICNet |
| 10-shot image generation | Cityscapes test | Time (ms) | 33 | ICNet |
| 10-shot image generation | CamVid | Frame (fps) | 27.8 | ICNet |
| 10-shot image generation | CamVid | Time (ms) | 36 | ICNet |
| 16k | DIS-TE4 | E-measure | 0.837 | ICNet |
| 16k | DIS-TE4 | HCE | 3690 | ICNet |
| 16k | DIS-TE4 | MAE | 0.099 | ICNet |
| 16k | DIS-TE4 | S-Measure | 0.776 | ICNet |
| 16k | DIS-TE4 | max F-Measure | 0.749 | ICNet |
| 16k | DIS-TE4 | weighted F-measure | 0.663 | ICNet |
| 16k | DIS-VD | E-measure | 0.811 | ICNet |
| 16k | DIS-VD | HCE | 1503 | ICNet |
| 16k | DIS-VD | MAE | 0.102 | ICNet |
| 16k | DIS-VD | S-Measure | 0.747 | ICNet |
| 16k | DIS-VD | max F-Measure | 0.697 | ICNet |
| 16k | DIS-VD | weighted F-measure | 0.609 | ICNet |
| 16k | DIS-TE2 | E-measure | 0.826 | ICNet |
| 16k | DIS-TE2 | HCE | 512 | ICNet |
| 16k | DIS-TE2 | MAE | 0.095 | ICNet |
| 16k | DIS-TE2 | S-Measure | 0.759 | ICNet |
| 16k | DIS-TE2 | max F-Measure | 0.716 | ICNet |
| 16k | DIS-TE2 | weighted F-measure | 0.627 | ICNet |
| 16k | DIS-TE1 | E-measure | 0.784 | ICNet |
| 16k | DIS-TE1 | HCE | 234 | ICNet |
| 16k | DIS-TE1 | MAE | 0.095 | ICNet |
| 16k | DIS-TE1 | S-Measure | 0.716 | ICNet |
| 16k | DIS-TE1 | max F-Measure | 0.631 | ICNet |
| 16k | DIS-TE1 | weighted F-measure | 0.535 | ICNet |
| 16k | DIS-TE3 | E-measure | 0.852 | ICNet |
| 16k | DIS-TE3 | HCE | 1001 | ICNet |
| 16k | DIS-TE3 | MAE | 0.091 | ICNet |
| 16k | DIS-TE3 | S-Measure | 0.78 | ICNet |
| 16k | DIS-TE3 | max F-Measure | 0.752 | ICNet |
| 16k | DIS-TE3 | weighted F-measure | 0.664 | ICNet |