TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BLIP-2

BLIP-2

Reported on 10 benchmarks across 6 tasks · 2 papers · 5 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision6 results

  • VideoonWebVid-CoVR
    R@1· 2023-08-28
    59.82
    SOTA
    CoVR-2: Automatic Data Construction for Composed Video RetrievalarXiv:2308.14746
  • Video RetrievalonWebVid-CoVR
    R@1· 2023-08-28
    59.82
    SOTA
    CoVR-2: Automatic Data Construction for Composed Video RetrievalarXiv:2308.14746
  • Image RetrievalonConQA Descriptive
    R-precision
    15.3
    best: 16.5 (CLIP)
  • Image RetrievalonConQA Descriptive
    Recall@1
    20.7
  • Image RetrievalonConQA Descriptive
    Recall@10
    62.1
    best: 65.5 (CLIP)
  • Image RetrievalonConQA Descriptive
    Recall@5
    51.7
    best: 58.3 (CLIP)

Natural Language Processing3 results

  • Visual Question Answering (VQA)onPMC-VQA
    BLEU-1· 2023-01-30
    7.6
    best: 23.2 (MedVInT)
    SOTA
    BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsarXiv:2301.12597
  • Instruction FollowingonLLaVA-Bench
    avg score· 2023-01-30
    38.1
    best: 85.7 (CuMo-7B)
    SOTA
    BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsarXiv:2301.12597
  • Visual Question Answering (VQA)onPMC-VQA
    Accuracy· 2023-01-30
    24.3
    best: 42.3 (MedVInT)
    BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsarXiv:2301.12597

Reasoning1 result

  • Generative Visual Question AnsweringonPMC-VQA
    BLEU-1· 2023-01-30
    7.6
    best: 23.2 (MedVInT)
    SOTA
    BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsarXiv:2301.12597