TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/MiniGPT-v2

MiniGPT-v2

Reported on 7 benchmarks across 3 tasks · 3 papers · 1 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing4 results

  • Visual Question Answering (VQA)onInfiMM-Eval
    Abductive· 2023-04-20
    13.28
    best: 77.88 (GPT-4V)
    MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsarXiv:2304.10592
  • Visual Question Answering (VQA)onInfiMM-Eval
    Analogical· 2023-04-20
    5.69
    best: 69.86 (GPT-4V)
    MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsarXiv:2304.10592
  • Visual Question Answering (VQA)onInfiMM-Eval
    Deductive· 2023-04-20
    11.02
    best: 74.86 (GPT-4V)
    MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsarXiv:2304.10592
  • Visual Question Answering (VQA)onInfiMM-Eval
    Overall score· 2023-04-20
    10.43
    best: 74.44 (GPT-4V)
    MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsarXiv:2304.10592

Reasoning3 results

  • Natural Language Visual GroundingonScreenSpot
    Accuracy (%)· 2023-10-14
    5.7
    best: 86.34 (UGround-V1-7B)
    SOTA
    MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningarXiv:2310.09478
  • Emotion InterpretationonEIBench (complex)
    Recall· 2025-04-10
    35.1
    best: 39.27 (ChatGPT-4o)
    Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language ModelsarXiv:2504.07521
  • Emotion InterpretationonEIBench
    Recall· 2025-04-10
    52.89
    best: 63.24 (Claude-3-haiku)
    Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language ModelsarXiv:2504.07521