TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/InstructBLIP

InstructBLIP

Reported on 10 benchmarks across 4 tasks · 3 papers

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning5 results

  • Multimodal ReasoningonREBUS
    Accuracy· 2024-01-11
    0.6
    best: 24 (GPT-4V)
    REBUS: A Robust Evaluation Benchmark of Understanding SymbolsarXiv:2401.05604
  • Visual ReasoningonWinoground
    Group Score· 2023-11-27
    3.3
    best: 58.75 (GPT-4V (CoT, pick b/w two options))
    Compositional Chain-of-Thought Prompting for Large Multimodal ModelsarXiv:2311.17076
  • Visual ReasoningonWinoground
    Image Score· 2023-11-27
    11.5
    best: 68.75 (GPT-4V (CoT, pick b/w two options))
    Compositional Chain-of-Thought Prompting for Large Multimodal ModelsarXiv:2311.17076
  • Visual ReasoningonWinoground
    Text Score· 2023-11-27
    7
    best: 75.5 (GPT-4o + CA)
    Compositional Chain-of-Thought Prompting for Large Multimodal ModelsarXiv:2311.17076
  • Video Question AnsweringonMVBench
    Avg.· 2023-05-11
    32.5
    best: 69.3 (LinVT-Qwen2-VL (7B))
    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningarXiv:2305.06500

Natural Language Processing5 results

  • Visual Question Answering (VQA)onInfiMM-Eval
    Abductive· 2023-05-11
    37.76
    best: 77.88 (GPT-4V)
    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningarXiv:2305.06500
  • Visual Question Answering (VQA)onInfiMM-Eval
    Analogical· 2023-05-11
    20.56
    best: 69.86 (GPT-4V)
    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningarXiv:2305.06500
  • Visual Question Answering (VQA)onInfiMM-Eval
    Deductive· 2023-05-11
    27.56
    best: 74.86 (GPT-4V)
    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningarXiv:2305.06500
  • Visual Question Answering (VQA)onInfiMM-Eval
    Overall score· 2023-05-11
    28.02
    best: 74.44 (GPT-4V)
    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningarXiv:2305.06500
  • Visual Question Answering (VQA)onInfoSeek
    Accuracy
    14.5
    best: 30.65 (RA-VQAv2 w/ PreFLMR)