Simon Vandenhende, Stamatios Georgoulis, Luc van Gool
In this paper, we argue about the importance of considering task interactions at multiple scales when distilling task information in a multi-task learning setup. In contrast to common belief, we show that tasks with high affinity at a certain scale are not guaranteed to retain this behaviour at other scales, and vice versa. We propose a novel architecture, namely MTI-Net, that builds upon this finding in three ways. First, it explicitly models task interactions at every scale via a multi-scale multi-modal distillation unit. Second, it propagates distilled task information from lower to higher scales via a feature propagation module. Third, it aggregates the refined task features from all scales via a feature aggregation unit to produce the final per-task predictions. Extensive experiments on two multi-task dense labeling datasets show that, unlike prior work, our multi-task model delivers on the full potential of multi-task learning, that is, smaller memory footprint, reduced number of calculations, and better performance w.r.t. single-task learning. The code is made publicly available: https://github.com/SimonVandenhende/Multi-Task-Learning-PyTorch.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Semantic Segmentation | NYU Depth v2 | Mean IoU | 49 | MTI-Net (HRNet-48) |
| Semantic Segmentation | UrbanLF | mIoU (Syn) | 79.1 | MTINet (HRNetV2-W48) |
| 10-shot image generation | NYU Depth v2 | Mean IoU | 49 | MTI-Net (HRNet-48) |
| 10-shot image generation | UrbanLF | mIoU (Syn) | 79.1 | MTINet (HRNetV2-W48) |