Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/InstructBLIP

InstructBLIP

Reported on 10 benchmarks across 4 tasks · 3 papers

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning5 results

Multimodal ReasoningonREBUS
Accuracy· 2024-01-11
0.6
best: 24 (GPT-4V)
REBUS: A Robust Evaluation Benchmark of Understanding Symbols arXiv:2401.05604
Visual ReasoningonWinoground
Group Score· 2023-11-27
3.3
best: 58.75 (GPT-4V (CoT, pick b/w two options))
Compositional Chain-of-Thought Prompting for Large Multimodal Models arXiv:2311.17076
Visual ReasoningonWinoground
Image Score· 2023-11-27
11.5
best: 68.75 (GPT-4V (CoT, pick b/w two options))
Compositional Chain-of-Thought Prompting for Large Multimodal Models arXiv:2311.17076
Visual ReasoningonWinoground
Text Score· 2023-11-27
7
best: 75.5 (GPT-4o + CA)
Compositional Chain-of-Thought Prompting for Large Multimodal Models arXiv:2311.17076
Video Question AnsweringonMVBench
Avg.· 2023-05-11
32.5
best: 69.3 (LinVT-Qwen2-VL (7B))
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning arXiv:2305.06500

Natural Language Processing5 results

Visual Question Answering (VQA)onInfiMM-Eval
Abductive· 2023-05-11
37.76
best: 77.88 (GPT-4V)
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning arXiv:2305.06500
Visual Question Answering (VQA)onInfiMM-Eval
Analogical· 2023-05-11
20.56
best: 69.86 (GPT-4V)
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning arXiv:2305.06500
Visual Question Answering (VQA)onInfiMM-Eval
Deductive· 2023-05-11
27.56
best: 74.86 (GPT-4V)
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning arXiv:2305.06500
Visual Question Answering (VQA)onInfiMM-Eval
Overall score· 2023-05-11
28.02
best: 74.44 (GPT-4V)
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning arXiv:2305.06500
Visual Question Answering (VQA)onInfoSeek
Accuracy
14.5
best: 30.65 (RA-VQAv2 w/ PreFLMR)