Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Qwen-VL-Chat

Qwen-VL-Chat

Reported on 9 benchmarks across 3 tasks · 2 papers

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing6 results

Visual Question Answering (VQA)onDocVQA test
ANLS· uses extra data· 2023-08-24
0.626
best: 0.9436 (Human)
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond arXiv:2308.12966
Visual Question Answering (VQA)onInfiMM-Eval
Abductive· 2023-08-24
44.39
best: 77.88 (GPT-4V)
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond arXiv:2308.12966
Visual Question Answering (VQA)onInfiMM-Eval
Analogical· 2023-08-24
30.42
best: 69.86 (GPT-4V)
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond arXiv:2308.12966
Visual Question Answering (VQA)onInfiMM-Eval
Deductive· 2023-08-24
37.55
best: 74.86 (GPT-4V)
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond arXiv:2308.12966
Visual Question Answering (VQA)onInfiMM-Eval
Overall score· 2023-08-24
37.39
best: 74.44 (GPT-4V)
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond arXiv:2308.12966
Visual Question Answering (VQA)onChartQA
1:1 Accuracy· uses extra data· 2023-08-24
66.3
best: 81.3 (ChartPaLI-5B + PaLM 2-S)
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond arXiv:2308.12966

Reasoning2 results

Emotion InterpretationonEIBench (complex)
Recall· 2025-04-10
22
best: 39.27 (ChatGPT-4o)
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models arXiv:2504.07521
Emotion InterpretationonEIBench
Recall· 2025-04-10
26.45
best: 63.24 (Claude-3-haiku)
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models arXiv:2504.07521

Computer Code1 result

Chart Question AnsweringonChartQA
1:1 Accuracy· uses extra data· 2023-08-24
66.3
best: 81.3 (ChartPaLI-5B + PaLM 2-S)
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond arXiv:2308.12966