TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/CLIP (zero-shot)

CLIP (zero-shot)

Reported on 6 benchmarks across 3 tasks · 2 papers · 2 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

  • Image-to-Text RetrievalonCOCO (Common Objects in Context)
    Recall@1· 2021-02-26
    58.4
    best: 85.4 (BLIP-2 (ViT-G, fine-tuned))
    SOTA
    Learning Transferable Visual Models From Natural Language SupervisionarXiv:2103.00020
  • Image-to-Text RetrievalonCOCO (Common Objects in Context)
    Recall@5· 2021-02-26
    81.5
    best: 97 (BLIP-2 (ViT-G, fine-tuned))
    SOTA
    Learning Transferable Visual Models From Natural Language SupervisionarXiv:2103.00020
  • Meme ClassificationonHateful Memes
    ROC-AUC· 2021-02-26
    0.661
    best: 0.911 (RA-HMD (Qwen2-VL-7B))
    Learning Transferable Visual Models From Natural Language SupervisionarXiv:2103.00020
  • Image-to-Text RetrievalonCOCO (Common Objects in Context)
    Recall@10· 2021-02-26
    88.1
    best: 99.8 (Oscar)
    Learning Transferable Visual Models From Natural Language SupervisionarXiv:2103.00020

Computer Vision2 results

  • Image RetrievalonCOCO (Common Objects in Context)
    recall@1· 2021-12-08
    33.29
    best: 68.3 (BLIP-2 ViT-G (fine-tuned))
    FLAVA: A Foundational Language And Vision Alignment ModelarXiv:2112.04482
  • Image RetrievalonCOCO (Common Objects in Context)
    recall@5· 2021-12-08
    62.47
    best: 91.8 (VisualSparta)
    FLAVA: A Foundational Language And Vision Alignment ModelarXiv:2112.04482