Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/CLIP (zero-shot)

CLIP (zero-shot)

Reported on 6 benchmarks across 3 tasks · 2 papers · 2 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

Image-to-Text RetrievalonCOCO (Common Objects in Context)
Recall@1· 2021-02-26
58.4
best: 85.4 (BLIP-2 (ViT-G, fine-tuned))
SOTA
Learning Transferable Visual Models From Natural Language Supervision arXiv:2103.00020
Image-to-Text RetrievalonCOCO (Common Objects in Context)
Recall@5· 2021-02-26
81.5
best: 97 (BLIP-2 (ViT-G, fine-tuned))
SOTA
Learning Transferable Visual Models From Natural Language Supervision arXiv:2103.00020
Meme ClassificationonHateful Memes
ROC-AUC· 2021-02-26
0.661
best: 0.911 (RA-HMD (Qwen2-VL-7B))
Learning Transferable Visual Models From Natural Language Supervision arXiv:2103.00020
Image-to-Text RetrievalonCOCO (Common Objects in Context)
Recall@10· 2021-02-26
88.1
best: 99.8 (Oscar)
Learning Transferable Visual Models From Natural Language Supervision arXiv:2103.00020

Computer Vision2 results

Image RetrievalonCOCO (Common Objects in Context)
recall@1· 2021-12-08
33.29
best: 68.3 (BLIP-2 ViT-G (fine-tuned))
FLAVA: A Foundational Language And Vision Alignment Model arXiv:2112.04482
Image RetrievalonCOCO (Common Objects in Context)
recall@5· 2021-12-08
62.47
best: 91.8 (VisualSparta)
FLAVA: A Foundational Language And Vision Alignment Model arXiv:2112.04482