TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/SCAN

SCAN

Reported on 73 benchmarks across 7 tasks · 4 papers · 67 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision31 results

  • Image ClusteringonImageNet-100 (TEMI Split)
    ACCURACY· 2020-05-25
    0.662
    best: 0.8343 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-100 (TEMI Split)
    ARI· 2020-05-25
    0.544
    best: 0.7581 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-100 (TEMI Split)
    NMI· 2020-05-25
    0.787
    best: 0.9006 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonCIFAR-10
    ARI· 2020-05-25
    0.772
    best: 0.989 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonCIFAR-10
    Accuracy· 2020-05-25
    0.883
    best: 0.995 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonCIFAR-10
    NMI· 2020-05-25
    0.797
    best: 0.985 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonCIFAR-100
    ARI· 2020-05-25
    0.333
    best: 0.834 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonCIFAR-100
    Accuracy· 2020-05-25
    0.507
    best: 0.898 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonCIFAR-100
    NMI· 2020-05-25
    0.486
    best: 0.915 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-200
    ACCURACY· 2020-05-25
    0.563
    best: 0.7776 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-200
    ARI· 2020-05-25
    0.441
    best: 0.6941 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-200
    NMI· 2020-05-25
    0.757
    best: 0.8839 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-50 (TEMI Split)
    ACCURACY· 2020-05-25
    0.751
    best: 0.8827 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-50 (TEMI Split)
    ARI· 2020-05-25
    0.635
    best: 0.8272 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet-50 (TEMI Split)
    NMI· 2020-05-25
    0.805
    best: 0.9232 (TEMI CLIP ViT-L (openai))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonSTL-10
    Accuracy· 2020-05-25
    0.809
    best: 0.997 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonSTL-10
    NMI· 2020-05-25
    0.698
    best: 0.993 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet
    Accuracy· 2020-05-25
    39.9
    best: 72.9 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClusteringonImageNet
    NMI· 2020-05-25
    72
    best: 88.2 (TURTLE (CLIP + DINOv2))
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClassificationonCIFAR-10
    Accuracy· 2020-05-25
    88.3
    best: 99.612 (efficient adaptive ensembling)
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image ClassificationonCIFAR-20
    Accuracy· 2020-05-25
    50.7
    best: 73.2 (MV-MR)
    SOTA
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320
  • Image RetrievalonPhotoChat
    R1· 2018-03-21
    10.4
    best: 15.2 (PaCE)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image RetrievalonPhotoChat
    R@10· 2018-03-21
    37.1
    best: 49.6 (PaCE)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image RetrievalonPhotoChat
    R@5· 2018-03-21
    27
    best: 36.7 (PaCE)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image RetrievalonPhotoChat
    Sum(R@1,5,10)· 2018-03-21
    74.5
    best: 101.5 (PaCE)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Open Vocabulary Semantic SegmentationonADE20K-847
    mIoU· 2023-12-07
    14
    best: 17.3 (UMG-CLIP-E/14)
    Open-Vocabulary Segmentation with Semantic-Assisted CalibrationarXiv:2312.04089
  • Open Vocabulary Semantic SegmentationonPASCAL Context-459
    mIoU· 2023-12-07
    16.7
    best: 25.8 (SILC)
    Open-Vocabulary Segmentation with Semantic-Assisted CalibrationarXiv:2312.04089
  • Open Vocabulary Semantic SegmentationonPascalVOC-20
    mIoU· 2023-12-07
    97.2
    best: 97.9 (UMG-CLIP-L/14)
    Open-Vocabulary Segmentation with Semantic-Assisted CalibrationarXiv:2312.04089
  • Open Vocabulary Semantic SegmentationonPASCAL Context-59
    mIoU· 2023-12-07
    59.3
    best: 64.6 (HyperSeg)
    Open-Vocabulary Segmentation with Semantic-Assisted CalibrationarXiv:2312.04089
  • Open Vocabulary Semantic SegmentationonADE20K-150
    mIoU· 2023-12-07
    33.5
    best: 38.2 (Mask-Adapter)
    Open-Vocabulary Segmentation with Semantic-Assisted CalibrationarXiv:2312.04089
  • Image ClassificationonSTL-10
    Accuracy· 2020-05-25
    80.9
    best: 99.7 (TURTLE (CLIP + DINOv2))
    SCAN: Learning to Classify Images without LabelsarXiv:2005.12320

Miscellaneous28 results

  • Image Retrieval with Multi-Modal QueryonRecipe1M
    Image-to-text R@1· 2020-03-09
    54
    best: 74.9 (VLPCook (R1M+))
    SOTA
    Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention MechanismarXiv:2003.03955
  • Image Retrieval with Multi-Modal QueryonRecipe1M
    Text-to-image R@1· 2020-03-09
    54.9
    best: 75.6 (VLPCook (R1M+))
    SOTA
    Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention MechanismarXiv:2003.03955
  • Cross-Modal Information RetrievalonRecipe1M
    Image-to-text R@1· 2020-03-09
    54
    best: 74.9 (VLPCook (R1M+))
    SOTA
    Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention MechanismarXiv:2003.03955
  • Cross-Modal Information RetrievalonRecipe1M
    Text-to-image R@1· 2020-03-09
    54.9
    best: 75.6 (VLPCook (R1M+))
    SOTA
    Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention MechanismarXiv:2003.03955
  • Image Retrieval with Multi-Modal QueryonFlickr30k
    Image-to-text R@1· 2018-03-21
    67.4
    best: 98.8 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonFlickr30k
    Image-to-text R@10· 2018-03-21
    95.8
    best: 100 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonFlickr30k
    Image-to-text R@5· 2018-03-21
    90.3
    best: 100 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonFlickr30k
    Text-to-image R@1· 2018-03-21
    48.6
    best: 93.3 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonFlickr30k
    Text-to-image R@10· 2018-03-21
    85.2
    best: 99.8 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonFlickr30k
    Text-to-image R@5· 2018-03-21
    77.7
    best: 99.5 (M2-Encoder)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Image-to-text R@1· 2018-03-21
    50.4
    best: 84.8 (BEiT-3)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Image-to-text R@10· 2018-03-21
    90
    best: 98.5 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Image-to-text R@5· 2018-03-21
    82.2
    best: 96.5 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@1· 2018-03-21
    38.6
    best: 68 (VAST)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@10· 2018-03-21
    80.4
    best: 92.8 (VAST)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Image Retrieval with Multi-Modal QueryonCOCO 2014
    Text-to-image R@5· 2018-03-21
    69.3
    best: 92.8 (BEiT-3)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonFlickr30k
    Image-to-text R@1· 2018-03-21
    67.4
    best: 98.8 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonFlickr30k
    Image-to-text R@10· 2018-03-21
    95.8
    best: 100 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonFlickr30k
    Image-to-text R@5· 2018-03-21
    90.3
    best: 100 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonFlickr30k
    Text-to-image R@1· 2018-03-21
    48.6
    best: 93.3 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonFlickr30k
    Text-to-image R@10· 2018-03-21
    85.2
    best: 99.8 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonFlickr30k
    Text-to-image R@5· 2018-03-21
    77.7
    best: 99.4 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonCOCO 2014
    Image-to-text R@1· 2018-03-21
    50.4
    best: 84.8 (BEiT-3)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonCOCO 2014
    Image-to-text R@10· 2018-03-21
    90
    best: 98.5 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonCOCO 2014
    Image-to-text R@5· 2018-03-21
    82.2
    best: 96.5 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@1· 2018-03-21
    38.6
    best: 68 (VAST)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@10· 2018-03-21
    80.4
    best: 92.8 (VAST)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal Information RetrievalonCOCO 2014
    Text-to-image R@5· 2018-03-21
    69.3
    best: 92.8 (BEiT-3)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024

Natural Language Processing14 results

  • Cross-Modal RetrievalonRecipe1M
    Image-to-text R@1· 2020-03-09
    54
    best: 74.9 (VLPCook (R1M+))
    SOTA
    Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention MechanismarXiv:2003.03955
  • Cross-Modal RetrievalonRecipe1M
    Text-to-image R@1· 2020-03-09
    54.9
    best: 75.6 (VLPCook (R1M+))
    SOTA
    Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention MechanismarXiv:2003.03955
  • Cross-Modal RetrievalonFlickr30k
    Image-to-text R@1· 2018-03-21
    67.4
    best: 98.8 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonFlickr30k
    Image-to-text R@10· 2018-03-21
    95.8
    best: 100 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonFlickr30k
    Image-to-text R@5· 2018-03-21
    90.3
    best: 100 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonFlickr30k
    Text-to-image R@1· 2018-03-21
    48.6
    best: 93.3 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonFlickr30k
    Text-to-image R@10· 2018-03-21
    85.2
    best: 99.8 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonFlickr30k
    Text-to-image R@5· 2018-03-21
    77.7
    best: 99.4 (ERNIE-ViL 2.0)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonCOCO 2014
    Image-to-text R@1· 2018-03-21
    50.4
    best: 84.8 (BEiT-3)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonCOCO 2014
    Image-to-text R@10· 2018-03-21
    90
    best: 98.5 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonCOCO 2014
    Image-to-text R@5· 2018-03-21
    82.2
    best: 96.5 (X2-VLM (large))
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@1· 2018-03-21
    38.6
    best: 68 (VAST)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@10· 2018-03-21
    80.4
    best: 92.8 (VAST)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024
  • Cross-Modal RetrievalonCOCO 2014
    Text-to-image R@5· 2018-03-21
    69.3
    best: 92.8 (BEiT-3)
    SOTA
    Stacked Cross Attention for Image-Text MatchingarXiv:1803.08024