Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/SCAN

SCAN

Reported on 73 benchmarks across 7 tasks · 4 papers · 67 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision31 results

Image ClusteringonImageNet-100 (TEMI Split)
ACCURACY· 2020-05-25
0.662
best: 0.8343 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-100 (TEMI Split)
ARI· 2020-05-25
0.544
best: 0.7581 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-100 (TEMI Split)
NMI· 2020-05-25
0.787
best: 0.9006 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonCIFAR-10
ARI· 2020-05-25
0.772
best: 0.989 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonCIFAR-10
Accuracy· 2020-05-25
0.883
best: 0.995 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonCIFAR-10
NMI· 2020-05-25
0.797
best: 0.985 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonCIFAR-100
ARI· 2020-05-25
0.333
best: 0.834 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonCIFAR-100
Accuracy· 2020-05-25
0.507
best: 0.898 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonCIFAR-100
NMI· 2020-05-25
0.486
best: 0.915 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-200
ACCURACY· 2020-05-25
0.563
best: 0.7776 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-200
ARI· 2020-05-25
0.441
best: 0.6941 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-200
NMI· 2020-05-25
0.757
best: 0.8839 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-50 (TEMI Split)
ACCURACY· 2020-05-25
0.751
best: 0.8827 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-50 (TEMI Split)
ARI· 2020-05-25
0.635
best: 0.8272 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet-50 (TEMI Split)
NMI· 2020-05-25
0.805
best: 0.9232 (TEMI CLIP ViT-L (openai))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonSTL-10
Accuracy· 2020-05-25
0.809
best: 0.997 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonSTL-10
NMI· 2020-05-25
0.698
best: 0.993 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet
Accuracy· 2020-05-25
39.9
best: 72.9 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClusteringonImageNet
NMI· 2020-05-25
72
best: 88.2 (TURTLE (CLIP + DINOv2))
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClassificationonCIFAR-10
Accuracy· 2020-05-25
88.3
best: 99.612 (efficient adaptive ensembling)
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image ClassificationonCIFAR-20
Accuracy· 2020-05-25
50.7
best: 73.2 (MV-MR)
SOTA
SCAN: Learning to Classify Images without Labels arXiv:2005.12320
Image RetrievalonPhotoChat
R1· 2018-03-21
10.4
best: 15.2 (PaCE)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image RetrievalonPhotoChat
R@10· 2018-03-21
37.1
best: 49.6 (PaCE)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image RetrievalonPhotoChat
R@5· 2018-03-21
27
best: 36.7 (PaCE)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image RetrievalonPhotoChat
Sum(R@1,5,10)· 2018-03-21
74.5
best: 101.5 (PaCE)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Open Vocabulary Semantic SegmentationonADE20K-847
mIoU· 2023-12-07
14
best: 17.3 (UMG-CLIP-E/14)
Open-Vocabulary Segmentation with Semantic-Assisted Calibration arXiv:2312.04089
Open Vocabulary Semantic SegmentationonPASCAL Context-459
mIoU· 2023-12-07
16.7
best: 25.8 (SILC)
Open-Vocabulary Segmentation with Semantic-Assisted Calibration arXiv:2312.04089
Open Vocabulary Semantic SegmentationonPascalVOC-20
mIoU· 2023-12-07
97.2
best: 97.9 (UMG-CLIP-L/14)
Open-Vocabulary Segmentation with Semantic-Assisted Calibration arXiv:2312.04089
Open Vocabulary Semantic SegmentationonPASCAL Context-59
mIoU· 2023-12-07
59.3
best: 64.6 (HyperSeg)
Open-Vocabulary Segmentation with Semantic-Assisted Calibration arXiv:2312.04089
Open Vocabulary Semantic SegmentationonADE20K-150
mIoU· 2023-12-07
33.5
best: 38.2 (Mask-Adapter)
Open-Vocabulary Segmentation with Semantic-Assisted Calibration arXiv:2312.04089
Image ClassificationonSTL-10
Accuracy· 2020-05-25
80.9
best: 99.7 (TURTLE (CLIP + DINOv2))
SCAN: Learning to Classify Images without Labels arXiv:2005.12320

Miscellaneous28 results

Image Retrieval with Multi-Modal QueryonRecipe1M
Image-to-text R@1· 2020-03-09
54
best: 74.9 (VLPCook (R1M+))
SOTA
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism arXiv:2003.03955
Image Retrieval with Multi-Modal QueryonRecipe1M
Text-to-image R@1· 2020-03-09
54.9
best: 75.6 (VLPCook (R1M+))
SOTA
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism arXiv:2003.03955
Cross-Modal Information RetrievalonRecipe1M
Image-to-text R@1· 2020-03-09
54
best: 74.9 (VLPCook (R1M+))
SOTA
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism arXiv:2003.03955
Cross-Modal Information RetrievalonRecipe1M
Text-to-image R@1· 2020-03-09
54.9
best: 75.6 (VLPCook (R1M+))
SOTA
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism arXiv:2003.03955
Image Retrieval with Multi-Modal QueryonFlickr30k
Image-to-text R@1· 2018-03-21
67.4
best: 98.8 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonFlickr30k
Image-to-text R@10· 2018-03-21
95.8
best: 100 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonFlickr30k
Image-to-text R@5· 2018-03-21
90.3
best: 100 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonFlickr30k
Text-to-image R@1· 2018-03-21
48.6
best: 93.3 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonFlickr30k
Text-to-image R@10· 2018-03-21
85.2
best: 99.8 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonFlickr30k
Text-to-image R@5· 2018-03-21
77.7
best: 99.5 (M2-Encoder)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonCOCO 2014
Image-to-text R@1· 2018-03-21
50.4
best: 84.8 (BEiT-3)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonCOCO 2014
Image-to-text R@10· 2018-03-21
90
best: 98.5 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonCOCO 2014
Image-to-text R@5· 2018-03-21
82.2
best: 96.5 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonCOCO 2014
Text-to-image R@1· 2018-03-21
38.6
best: 68 (VAST)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonCOCO 2014
Text-to-image R@10· 2018-03-21
80.4
best: 92.8 (VAST)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Image Retrieval with Multi-Modal QueryonCOCO 2014
Text-to-image R@5· 2018-03-21
69.3
best: 92.8 (BEiT-3)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonFlickr30k
Image-to-text R@1· 2018-03-21
67.4
best: 98.8 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonFlickr30k
Image-to-text R@10· 2018-03-21
95.8
best: 100 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonFlickr30k
Image-to-text R@5· 2018-03-21
90.3
best: 100 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonFlickr30k
Text-to-image R@1· 2018-03-21
48.6
best: 93.3 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonFlickr30k
Text-to-image R@10· 2018-03-21
85.2
best: 99.8 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonFlickr30k
Text-to-image R@5· 2018-03-21
77.7
best: 99.4 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonCOCO 2014
Image-to-text R@1· 2018-03-21
50.4
best: 84.8 (BEiT-3)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonCOCO 2014
Image-to-text R@10· 2018-03-21
90
best: 98.5 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonCOCO 2014
Image-to-text R@5· 2018-03-21
82.2
best: 96.5 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonCOCO 2014
Text-to-image R@1· 2018-03-21
38.6
best: 68 (VAST)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonCOCO 2014
Text-to-image R@10· 2018-03-21
80.4
best: 92.8 (VAST)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal Information RetrievalonCOCO 2014
Text-to-image R@5· 2018-03-21
69.3
best: 92.8 (BEiT-3)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024

Natural Language Processing14 results

Cross-Modal RetrievalonRecipe1M
Image-to-text R@1· 2020-03-09
54
best: 74.9 (VLPCook (R1M+))
SOTA
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism arXiv:2003.03955
Cross-Modal RetrievalonRecipe1M
Text-to-image R@1· 2020-03-09
54.9
best: 75.6 (VLPCook (R1M+))
SOTA
Cross-Modal Food Retrieval: Learning a Joint Embedding of Food Images and Recipes with Semantic Consistency and Attention Mechanism arXiv:2003.03955
Cross-Modal RetrievalonFlickr30k
Image-to-text R@1· 2018-03-21
67.4
best: 98.8 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonFlickr30k
Image-to-text R@10· 2018-03-21
95.8
best: 100 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonFlickr30k
Image-to-text R@5· 2018-03-21
90.3
best: 100 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonFlickr30k
Text-to-image R@1· 2018-03-21
48.6
best: 93.3 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonFlickr30k
Text-to-image R@10· 2018-03-21
85.2
best: 99.8 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonFlickr30k
Text-to-image R@5· 2018-03-21
77.7
best: 99.4 (ERNIE-ViL 2.0)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonCOCO 2014
Image-to-text R@1· 2018-03-21
50.4
best: 84.8 (BEiT-3)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonCOCO 2014
Image-to-text R@10· 2018-03-21
90
best: 98.5 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonCOCO 2014
Image-to-text R@5· 2018-03-21
82.2
best: 96.5 (X2-VLM (large))
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonCOCO 2014
Text-to-image R@1· 2018-03-21
38.6
best: 68 (VAST)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonCOCO 2014
Text-to-image R@10· 2018-03-21
80.4
best: 92.8 (VAST)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024
Cross-Modal RetrievalonCOCO 2014
Text-to-image R@5· 2018-03-21
69.3
best: 92.8 (BEiT-3)
SOTA
Stacked Cross Attention for Image-Text Matching arXiv:1803.08024