TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Qwen-VL-Chat

Qwen-VL-Chat

Reported on 9 benchmarks across 3 tasks · 2 papers

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing6 results

  • Visual Question Answering (VQA)onDocVQA test
    ANLS· uses extra data· 2023-08-24
    0.626
    best: 0.9436 (Human)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onInfiMM-Eval
    Abductive· 2023-08-24
    44.39
    best: 77.88 (GPT-4V)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onInfiMM-Eval
    Analogical· 2023-08-24
    30.42
    best: 69.86 (GPT-4V)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onInfiMM-Eval
    Deductive· 2023-08-24
    37.55
    best: 74.86 (GPT-4V)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onInfiMM-Eval
    Overall score· 2023-08-24
    37.39
    best: 74.44 (GPT-4V)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onChartQA
    1:1 Accuracy· uses extra data· 2023-08-24
    66.3
    best: 81.3 (ChartPaLI-5B + PaLM 2-S)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966

Reasoning2 results

  • Emotion InterpretationonEIBench (complex)
    Recall· 2025-04-10
    22
    best: 39.27 (ChatGPT-4o)
    Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language ModelsarXiv:2504.07521
  • Emotion InterpretationonEIBench
    Recall· 2025-04-10
    26.45
    best: 63.24 (Claude-3-haiku)
    Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language ModelsarXiv:2504.07521

Computer Code1 result

  • Chart Question AnsweringonChartQA
    1:1 Accuracy· uses extra data· 2023-08-24
    66.3
    best: 81.3 (ChartPaLI-5B + PaLM 2-S)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966