Rawal Khirodkar, Brandon Smith, Siddhartha Chandra, Amit Agrawal, Antonio Criminisi
Ensemble approaches for deep-learning-based semantic segmentation remain insufficiently explored despite the proliferation of competitive benchmarks and downstream applications. In this work, we explore and benchmark the popular ensembling approach of combining predictions of multiple, independently-trained, state-of-the-art models at test time on popular datasets. Furthermore, we propose a novel method inspired by boosting to sequentially ensemble networks that significantly outperforms the naive ensemble baseline. Our approach trains a cascade of models conditioned on class probabilities predicted by the previous model as an additional input. A key benefit of this approach is that it allows for dynamic computation offloading, which helps deploy models on mobile devices. Our proposed novel ADaptive modulatiON (ADON) block allows spatial feature modulation at various layers using previous-stage probabilities. Our approach does not require sophisticated sample selection strategies during training and works with multiple neural architectures. We significantly improve over the naive ensemble baseline on challenging datasets such as Cityscapes, ADE-20K, COCO-Stuff, and PASCAL-Context and set a new state-of-the-art.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | Cityscapes val | mIoU | 84.8 | Sequential Ensemble (MiT-B5 + HRNet) |
| Semantic Segmentation | PASCAL Context | mIoU | 62.1 | Sequential Ensemble (Segformer + HRNet) |
| Semantic Segmentation | ADE20K | Params (M) | 216.3 | Sequential Ensemble (SegFormer) |
| Semantic Segmentation | ADE20K | Validation mIoU | 54 | Sequential Ensemble (SegFormer) |
| Semantic Segmentation | ADE20K | Validation mIoU | 46.8 | Sequential Ensemble (DeepLabv3+) |
| 10-shot image generation | Cityscapes val | mIoU | 84.8 | Sequential Ensemble (MiT-B5 + HRNet) |
| 10-shot image generation | PASCAL Context | mIoU | 62.1 | Sequential Ensemble (Segformer + HRNet) |
| 10-shot image generation | ADE20K | Params (M) | 216.3 | Sequential Ensemble (SegFormer) |
| 10-shot image generation | ADE20K | Validation mIoU | 54 | Sequential Ensemble (SegFormer) |
| 10-shot image generation | ADE20K | Validation mIoU | 46.8 | Sequential Ensemble (DeepLabv3+) |