Metric: mIoU (higher is better)
| # | Model↕ | mIoU▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | CorrCLIP | 91.8 | No | CorrCLIP: Reconstructing Correlations in CLIP wi... | 2024-11-15 | Code |
| 2 | TextRegion | 89.5 | No | TextRegion: Text-Aligned Region Tokens from Froz... | 2025-05-29 | Code |
| 3 | Trident | 88.7 | No | Harnessing Vision Foundation Models for High-Per... | 2024-11-14 | Code |
| 4 | TagAlign | 87.9 | No | TagAlign: Improving Vision-Language Alignment wi... | 2023-12-21 | Code |
| 5 | ProxyCLIP | 83.3 | No | ProxyCLIP: Proxy Attention Improves CLIP for Ope... | 2024-08-09 | Code |
| 6 | TCL | 83.2 | No | Learning to Generate Text-grounded Mask for Open... | 2022-12-01 | Code |
| 7 | GroupViT (RedCaps) | 79.7 | No | GroupViT: Semantic Segmentation Emerges from Tex... | 2022-02-22 | Code |
| 8 | COSMOS ViT-B/16 | 77.7 | No | COSMOS: Cross-Modality Self-Distillation for Vis... | 2024-12-02 | Code |
| 9 | MaskCLIP | 74.9 | No | Extract Free Dense Labels from CLIP | 2021-12-02 | Code |
| 10 | ReCo | 57.7 | No | ReCo: Retrieve and Co-segment for Zero-shot Tran... | 2022-06-14 | Code |