Qiming Zhang, Jing Zhang, Wei Liu, DaCheng Tao
Unsupervised domain adaptation (UDA) aims to enhance the generalization capability of a certain model from a source domain to a target domain. UDA is of particular significance since no extra effort is devoted to annotating target domain samples. However, the different data distributions in the two domains, or \emph{domain shift/discrepancy}, inevitably compromise the UDA performance. Although there has been a progress in matching the marginal distributions between two domains, the classifier favors the source domain features and makes incorrect predictions on the target domain due to category-agnostic feature alignment. In this paper, we propose a novel category anchor-guided (CAG) UDA model for semantic segmentation, which explicitly enforces category-aware feature alignment to learn shared discriminative features and classifiers simultaneously. First, the category-wise centroids of the source domain features are used as guided anchors to identify the active features in the target domain and also assign them pseudo-labels. Then, we leverage an anchor-based pixel-level distance loss and a discriminative loss to drive the intra-category features closer and the inter-category features further apart, respectively. Finally, we devise a stagewise training mechanism to reduce the error accumulation and adapt the proposed model progressively. Experiments on both the GTA5$\rightarrow $Cityscapes and SYNTHIA$\rightarrow $Cityscapes scenarios demonstrate the superiority of our CAG-UDA model over the state-of-the-art methods. The code is available at \url{https://github.com/RogerZhangzz/CAG_UDA}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 44.5 | CAG-UDA |
| Image-to-Image Translation | GTAV-to-Cityscapes Labels | mIoU | 50.2 | CAG-UDA |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 52.6 | CAG-UDA |
| Image-to-Image Translation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 44.5 | CAG-UDA |
| Image Generation | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 44.5 | CAG-UDA |
| Image Generation | GTAV-to-Cityscapes Labels | mIoU | 50.2 | CAG-UDA |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 52.6 | CAG-UDA |
| Image Generation | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 44.5 | CAG-UDA |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | mIoU (13 classes) | 44.5 | CAG-UDA |
| 1 Image, 2*2 Stitching | GTAV-to-Cityscapes Labels | mIoU | 50.2 | CAG-UDA |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (13 classes) | 52.6 | CAG-UDA |
| 1 Image, 2*2 Stitching | SYNTHIA-to-Cityscapes | MIoU (16 classes) | 44.5 | CAG-UDA |