Yiqiao Qiu, Yixing Shen, Zhuohao Sun, Yanchong Zheng, Xiaobin Chang, Weishi Zheng, Ruixuan Wang
Continually learning to segment more and more types of image regions is a desired capability for many intelligent systems. However, such continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning. While multiple knowledge distillation strategies originally for continual classification have been well adapted to continual semantic segmentation, they only consider transferring old knowledge based on the outputs from one or more layers of deep fully convolutional networks. Different from existing solutions, this study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements (Eg. pixels or small local regions) within each image which can capture both within-class and between-class knowledge. The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model. Considering that pixels belonging to the same class in each image often share similar visual properties, a class-specific region pooling is applied to provide more efficient relationship information for knowledge transfer. Extensive evaluations on multiple public benchmarks support that the proposed self-attention transfer method can further effectively alleviate the catastrophic forgetting issue, and its flexible combination with one or more widely adopted strategies significantly outperforms state-of-the-art solutions.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 69.27 | SATS-M |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 61.6 | SATS |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 78.72 | SATS-M |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 75.7 | SATS |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 76.61 | SATS-M |
| Semantic Segmentation | PASCAL VOC 2012 | mIoU | 74.48 | SATS |
| Semantic Segmentation | ADE20K | Mean IoU (test) | 35.45 | SATS-M |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 71.36 | SATS-M |
| Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 67.36 | SATS |
| Continual Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 71.36 | SATS-M |
| Continual Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 67.36 | SATS |
| Continual Semantic Segmentation | ADE20K | Mean IoU (test) | 32.56 | SATS-M |
| Continual Learning | PASCAL VOC 2012 | mIoU | 69.27 | SATS-M |
| Continual Learning | PASCAL VOC 2012 | mIoU | 61.6 | SATS |
| Continual Learning | PASCAL VOC 2012 | Mean IoU (val) | 78.72 | SATS-M |
| Continual Learning | PASCAL VOC 2012 | Mean IoU (val) | 75.7 | SATS |
| Continual Learning | PASCAL VOC 2012 | mIoU | 76.61 | SATS-M |
| Continual Learning | PASCAL VOC 2012 | mIoU | 74.48 | SATS |
| Continual Learning | ADE20K | Mean IoU (test) | 35.45 | SATS-M |
| Continual Learning | PASCAL VOC 2012 | Mean IoU (test) | 71.36 | SATS-M |
| Continual Learning | PASCAL VOC 2012 | Mean IoU (test) | 67.36 | SATS |
| 2D Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 71.36 | SATS-M |
| 2D Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 67.36 | SATS |
| 2D Semantic Segmentation | ADE20K | Mean IoU (test) | 32.56 | SATS-M |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 76.61 | SATS-M |
| 2D Semantic Segmentation | PASCAL VOC 2012 | mIoU | 74.48 | SATS |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 69.27 | SATS-M |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 61.6 | SATS |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU (val) | 78.72 | SATS-M |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU (val) | 75.7 | SATS |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 76.61 | SATS-M |
| Class Incremental Learning | PASCAL VOC 2012 | mIoU | 74.48 | SATS |
| Class Incremental Learning | ADE20K | Mean IoU (test) | 35.45 | SATS-M |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU (test) | 71.36 | SATS-M |
| Class Incremental Learning | PASCAL VOC 2012 | Mean IoU (test) | 67.36 | SATS |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 69.27 | SATS-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 61.6 | SATS |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 78.72 | SATS-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (val) | 75.7 | SATS |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 76.61 | SATS-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | mIoU | 74.48 | SATS |
| Class-Incremental Semantic Segmentation | ADE20K | Mean IoU (test) | 35.45 | SATS-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 71.36 | SATS-M |
| Class-Incremental Semantic Segmentation | PASCAL VOC 2012 | Mean IoU (test) | 67.36 | SATS |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 69.27 | SATS-M |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 61.6 | SATS |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU (val) | 78.72 | SATS-M |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU (val) | 75.7 | SATS |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 76.61 | SATS-M |
| 10-shot image generation | PASCAL VOC 2012 | mIoU | 74.48 | SATS |
| 10-shot image generation | ADE20K | Mean IoU (test) | 35.45 | SATS-M |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU (test) | 71.36 | SATS-M |
| 10-shot image generation | PASCAL VOC 2012 | Mean IoU (test) | 67.36 | SATS |