TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/TTD (MaskCLIP)

TTD (MaskCLIP)

Reported on 24 benchmarks across 4 tasks · 1 paper · 1 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision10 results

  • Unsupervised Semantic SegmentationonCityscapes val
    mIoU· 2024-03-30
    32
    best: 51.1 (CorrCLIP)
    SOTA
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Unsupervised Semantic SegmentationonCOCO-Stuff-171
    mIoU· 2024-03-30
    19.4
    best: 34 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Unsupervised Semantic SegmentationonCOCO-Object
    mIoU· 2024-03-30
    26.5
    best: 49.4 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Unsupervised Semantic SegmentationonADE20K
    Mean IoU (val)· 2024-03-30
    12.7
    best: 30.7 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Unsupervised Semantic SegmentationonPASCAL Context-59
    mIoU· 2024-03-30
    31
    best: 50.8 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Unsupervised Semantic SegmentationonPASCAL VOC
    mIoU· 2024-03-30
    43.1
    best: 76.7 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Open Vocabulary Semantic SegmentationonCOCO-Stuff-171
    mIoU· 2024-03-30
    19.4
    best: 23.7 (TTD (TCL))
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Open Vocabulary Semantic SegmentationonCityscapes
    mIoU· 2024-03-30
    27
    best: 56.2 (FC-CLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Open Vocabulary Semantic SegmentationonPASCAL Context-59
    mIoU· 2024-03-30
    31
    best: 64.6 (HyperSeg)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Open Vocabulary Semantic SegmentationonADE20K-150
    mIoU· 2024-03-30
    12.7
    best: 38.2 (Mask-Adapter)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384

Medical7 results

  • Semantic SegmentationonCC3M-TagMask
    mIoU· 2024-03-30
    50.2
    best: 65.5 (TTD (TCL))
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Semantic SegmentationonCOCO-Stuff-171
    mIoU· 2024-03-30
    19.4
    best: 34 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Semantic SegmentationonCOCO-Object
    mIoU· 2024-03-30
    26.5
    best: 49.4 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Semantic SegmentationonADE20K
    Mean IoU (val)· 2024-03-30
    12.7
    best: 30.7 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Semantic SegmentationonCityscapes val
    mIoU· 2024-03-30
    32
    best: 90.3 (EfficientPS (Cityscapes-fine))
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Semantic SegmentationonPASCAL Context-59
    mIoU· 2024-03-30
    31
    best: 50.8 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • Semantic SegmentationonPASCAL VOC
    mIoU· 2024-03-30
    43.1
    best: 76.7 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384

Audio7 results

  • 10-shot image generationonCC3M-TagMask
    mIoU· 2024-03-30
    50.2
    best: 65.5 (TTD (TCL))
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • 10-shot image generationonCOCO-Stuff-171
    mIoU· 2024-03-30
    19.4
    best: 34 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • 10-shot image generationonCOCO-Object
    mIoU· 2024-03-30
    26.5
    best: 49.4 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • 10-shot image generationonADE20K
    Mean IoU (val)· 2024-03-30
    12.7
    best: 30.7 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • 10-shot image generationonCityscapes val
    mIoU· 2024-03-30
    32
    best: 90.3 (EfficientPS (Cityscapes-fine))
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • 10-shot image generationonPASCAL Context-59
    mIoU· 2024-03-30
    31
    best: 50.8 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384
  • 10-shot image generationonPASCAL VOC
    mIoU· 2024-03-30
    43.1
    best: 76.7 (CorrCLIP)
    TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag BiasarXiv:2404.00384