Metric: mIoU (higher is better)
| # | Model↕ | mIoU▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CorrCLIP | 50.8 | No | CorrCLIP: Reconstructing Correlations in CLIP wi... | 2024-11-15 | Code |
| 2 | TextRegion | 46.1 | No | TextRegion: Text-Aligned Region Tokens from Froz... | 2025-05-29 | Code |
| 3 | Trident | 44.3 | No | Harnessing Vision Foundation Models for High-Per... | 2024-11-14 | Code |
| 4 | ProxyCLIP | 39.6 | No | ProxyCLIP: Proxy Attention Improves CLIP for Ope... | 2024-08-09 | Code |
| 5 | TagAlign | 37.6 | No | TagAlign: Improving Vision-Language Alignment wi... | 2023-12-21 | Code |
| 6 | TTD (TCL) | 37.4 | No | TTD: Text-Tag Self-Distillation Enhancing Image-... | 2024-03-30 | Code |
| 7 | TCL | 33.9 | No | Learning to Generate Text-grounded Mask for Open... | 2022-12-01 | Code |
| 8 | COSMOS ViT-B/16 | 33.7 | No | COSMOS: Cross-Modality Self-Distillation for Vis... | 2024-12-02 | Code |
| 9 | TTD (MaskCLIP) | 31 | No | TTD: Text-Tag Self-Distillation Enhancing Image-... | 2024-03-30 | Code |
| 10 | MaskCLIP | 26.4 | No | Extract Free Dense Labels from CLIP | 2021-12-02 | Code |
| 11 | GroupViT (RedCaps) | 23.4 | No | GroupViT: Semantic Segmentation Emerges from Tex... | 2022-02-22 | Code |
| 12 | ReCo | 22.3 | No | ReCo: Retrieve and Co-segment for Zero-shot Tran... | 2022-06-14 | Code |