Long Xu, Shanghong Li, Yongquan Chen, Jun Luo, Shiwu Lai
Interactive segmentation has gained significant attention for its application in human-computer interaction and data annotation. To address the target scale variation issue in interactive segmentation, a novel multi-scale token adaptation algorithm is proposed. By performing top-k operations across multi-scale tokens, the computational complexity is greatly simplified while ensuring performance. To enhance the robustness of multi-scale token selection, we also propose a token learning algorithm based on contrastive loss. This algorithm can effectively improve the performance of multi-scale token adaptation. Extensive benchmarking shows that the algorithm achieves state-of-the-art (SOTA) performance, compared to current methods. An interactive demo and all reproducible codes will be released at https://github.com/hahamyt/mst.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Interactive Segmentation | GrabCut | NoC@90 | 1.48 | ViT-B+MST+CL |
| Interactive Segmentation | Berkeley | NoC@90 | 1.5 | ViT-B+MST+CL |
| Interactive Segmentation | COCO minival | NoC@85 | 2.08 | ViT-B+MST+CL |
| Interactive Segmentation | COCO minival | NoC@90 | 2.85 | ViT-B+MST+CL |
| Interactive Segmentation | DAVIS-585 | NoC@85 | 1.8 | ViT-B+MST+CL |
| Interactive Segmentation | DAVIS-585 | NoC@90 | 2.29 | ViT-B+MST+CL |
| Interactive Segmentation | PascalVOC | NoC@85 | 1.69 | ViT-B+MST+CL |
| Interactive Segmentation | PascalVOC | NoC@90 | 1.9 | ViT-B+MST+CL |
| Interactive Segmentation | DAVIS | NoC@90 | 4.55 | ViT-B+MST+CL |
| Interactive Segmentation | SBD | NoC@85 | 3.03 | ViT-B+MST+CL |