Sungmin Cha, Beomyoung Kim, Youngjoon Yoo, Taesup Moon
This paper introduces a solid state-of-the-art baseline for a class-incremental semantic segmentation (CISS) problem. While the recent CISS algorithms utilize variants of the knowledge distillation (KD) technique to tackle the problem, they failed to fully address the critical challenges in CISS causing the catastrophic forgetting; the semantic drift of the background class and the multi-label prediction issue. To better address these challenges, we propose a new method, dubbed SSUL-M (Semantic Segmentation with Unknown Label with Memory), by carefully combining techniques tailored for semantic segmentation. Specifically, we claim three main contributions. (1) defining unknown classes within the background class to help to learn future classes (help plasticity), (2) freezing backbone network and past classifiers with binary cross-entropy loss and pseudo-labeling to overcome catastrophic forgetting (help stability), and (3) utilizing tiny exemplar memory for the first time in CISS to improve both plasticity and stability. The extensively conducted experiments show the effectiveness of our method, achieving significantly better performance than the recent state-of-the-art baselines on the standard benchmark datasets. Furthermore, we justify our contributions with thorough ablation analyses and discuss different natures of the CISS problem compared to the traditional class-incremental learning targeting classification. The official code is available at https://github.com/clovaai/SSUL.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 64.12 | SSUL-M |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 59.25 | SSUL |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 73.02 | SSUL-M |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 71.22 | SSUL |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 71.37 | SSUL-M |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 67.61 | SSUL |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 68.58 | SSUL-M |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 64.01 | SSUL |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU | 69.83 | SSUL-M |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU | 69.1 | SSUL |
| Semantic Segmentation | ADE20K | mIoU | 34.56 | SSUL-M |
| Semantic Segmentation | ADE20K | mIoU | 32.48 | SSUL |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 53.5 | SSUL-M |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 50.87 | SSUL |
| Semantic Segmentation | ADE20K | mIoU | 34.37 | SSUL-M |
| Semantic Segmentation | ADE20K | mIoU | 33.58 | SSUL |
| Semantic Segmentation | ADE20K | mIoU | 29.77 | SSUL-M |
| Semantic Segmentation | ADE20K | mIoU | 29.56 | SSUL |
| Continual Learning | PASCAL VOC 2012 | mIoU | 64.12 | SSUL-M |
| Continual Learning | PASCAL VOC 2012 | mIoU | 59.25 | SSUL |
| Continual Learning | PASCAL VOC 2012 | Mean IoU (val) | 73.02 | SSUL-M |
| Continual Learning | PASCAL VOC 2012 | Mean IoU (val) | 71.22 | SSUL |
| Continual Learning | PASCAL VOC 2012 | mIoU | 71.37 | SSUL-M |
| Continual Learning | PASCAL VOC 2012 | mIoU | 67.61 | SSUL |
| Continual Learning | PASCAL VOC 2012 | mIoU | 68.58 | SSUL-M |
| Continual Learning | PASCAL VOC 2012 | mIoU | 64.01 | SSUL |
| Continual Learning | PASCAL VOC 2012 | Mean IoU | 69.83 | SSUL-M |
| Continual Learning | PASCAL VOC 2012 | Mean IoU | 69.1 | SSUL |
| Continual Learning | ADE20K | mIoU | 34.56 | SSUL-M |
| Continual Learning | ADE20K | mIoU | 32.48 | SSUL |
| Continual Learning | PASCAL VOC 2012 | mIoU | 53.5 | SSUL-M |
| Continual Learning | PASCAL VOC 2012 | mIoU | 50.87 | SSUL |
| Continual Learning | ADE20K | mIoU | 34.37 | SSUL-M |
| Continual Learning | ADE20K | mIoU | 33.58 | SSUL |
| Continual Learning | ADE20K | mIoU | 29.77 | SSUL-M |
| Continual Learning | ADE20K | mIoU | 29.56 | SSUL |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 71.37 | SSUL-M |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 67.61 | SSUL |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 68.58 | SSUL-M |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 64.01 | SSUL |
| 2D Semantic Segmentation | PASCAL VOC 2012 | Mean IoU | 69.83 | SSUL-M |
| 2D Semantic Segmentation | PASCAL VOC 2012 | Mean IoU | 69.1 | SSUL |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 53.5 | SSUL-M |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 50.87 | SSUL |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 64.12 | SSUL-M |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 59.25 | SSUL |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU (val) | 73.02 | SSUL-M |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU (val) | 71.22 | SSUL |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 71.37 | SSUL-M |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 67.61 | SSUL |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 68.58 | SSUL-M |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 64.01 | SSUL |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU | 69.83 | SSUL-M |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU | 69.1 | SSUL |
| Class Incremental Learning | ADE20K | mIoU | 34.56 | SSUL-M |
| Class Incremental Learning | ADE20K | mIoU | 32.48 | SSUL |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 53.5 | SSUL-M |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 50.87 | SSUL |
| Class Incremental Learning | ADE20K | mIoU | 34.37 | SSUL-M |
| Class Incremental Learning | ADE20K | mIoU | 33.58 | SSUL |
| Class Incremental Learning | ADE20K | mIoU | 29.77 | SSUL-M |
| Class Incremental Learning | ADE20K | mIoU | 29.56 | SSUL |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 64.12 | SSUL-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 59.25 | SSUL |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 73.02 | SSUL-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 71.22 | SSUL |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 71.37 | SSUL-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 67.61 | SSUL |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 68.58 | SSUL-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 64.01 | SSUL |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU | 69.83 | SSUL-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU | 69.1 | SSUL |
| Class-Incremental Semantic Segmentation | ADE20K | mIoU | 34.56 | SSUL-M |
| Class-Incremental Semantic Segmentation | ADE20K | mIoU | 32.48 | SSUL |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 53.5 | SSUL-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 50.87 | SSUL |
| Class-Incremental Semantic Segmentation | ADE20K | mIoU | 34.37 | SSUL-M |
| Class-Incremental Semantic Segmentation | ADE20K | mIoU | 33.58 | SSUL |
| Class-Incremental Semantic Segmentation | ADE20K | mIoU | 29.77 | SSUL-M |
| Class-Incremental Semantic Segmentation | ADE20K | mIoU | 29.56 | SSUL |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 64.12 | SSUL-M |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 59.25 | SSUL |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU (val) | 73.02 | SSUL-M |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU (val) | 71.22 | SSUL |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 71.37 | SSUL-M |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 67.61 | SSUL |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 68.58 | SSUL-M |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 64.01 | SSUL |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU | 69.83 | SSUL-M |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU | 69.1 | SSUL |
| 10-shot image generation | ADE20K | mIoU | 34.56 | SSUL-M |
| 10-shot image generation | ADE20K | mIoU | 32.48 | SSUL |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 53.5 | SSUL-M |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 50.87 | SSUL |
| 10-shot image generation | ADE20K | mIoU | 34.37 | SSUL-M |
| 10-shot image generation | ADE20K | mIoU | 33.58 | SSUL |
| 10-shot image generation | ADE20K | mIoU | 29.77 | SSUL-M |
| 10-shot image generation | ADE20K | mIoU | 29.56 | SSUL |