Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/BLIP-2

BLIP-2

Reported on 10 benchmarks across 6 tasks · 2 papers · 5 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision6 results

VideoonWebVid-CoVR
R@1· 2023-08-28
59.82
SOTA
CoVR-2: Automatic Data Construction for Composed Video Retrieval arXiv:2308.14746
Video RetrievalonWebVid-CoVR
R@1· 2023-08-28
59.82
SOTA
CoVR-2: Automatic Data Construction for Composed Video Retrieval arXiv:2308.14746
Image RetrievalonConQA Descriptive
R-precision
15.3
best: 16.5 (CLIP)
Image RetrievalonConQA Descriptive
Recall@1
20.7
Image RetrievalonConQA Descriptive
Recall@10
62.1
best: 65.5 (CLIP)
Image RetrievalonConQA Descriptive
Recall@5
51.7
best: 58.3 (CLIP)

Natural Language Processing3 results

Visual Question Answering (VQA)onPMC-VQA
BLEU-1· 2023-01-30
7.6
best: 23.2 (MedVInT)
SOTA
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models arXiv:2301.12597
Instruction FollowingonLLaVA-Bench
avg score· 2023-01-30
38.1
best: 85.7 (CuMo-7B)
SOTA
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models arXiv:2301.12597
Visual Question Answering (VQA)onPMC-VQA
Accuracy· 2023-01-30
24.3
best: 42.3 (MedVInT)
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models arXiv:2301.12597

Reasoning1 result

Generative Visual Question AnsweringonPMC-VQA
BLEU-1· 2023-01-30
7.6
best: 23.2 (MedVInT)
SOTA
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models arXiv:2301.12597