TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Qwen-VL-Max

Qwen-VL-Max

Reported on 27 benchmarks across 4 tasks · 2 papers · 2 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing18 results

  • Visual Question Answering (VQA)onEmbSpatial-Bench
    Generation· 2023-08-24
    49.11
    best: 70.88 (SoFar)
    SOTA
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonEmbSpatial-Bench
    Generation· 2023-08-24
    49.11
    best: 70.88 (SoFar)
    SOTA
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    #Learning Samples (N)· 2023-08-24
    16
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    ACC· 2023-08-24
    40.33
    best: 51.45 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    BLEU-4· 2023-08-24
    24.3
    best: 67.91 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    CIDEr· 2023-08-24
    201.47
    best: 510.44 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    Detection· 2023-08-24
    1.05
    best: 29.09 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    METEOR· 2023-08-24
    23.4
    best: 50.55 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    ROUGE-L· 2023-08-24
    34.52
    best: 79.41 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question Answering (VQA)onSME
    SPICE· 2023-08-24
    26.13
    best: 64.09 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    #Learning Samples (N)· 2023-08-24
    16
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    ACC· 2023-08-24
    40.33
    best: 51.45 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    BLEU-4· 2023-08-24
    24.3
    best: 67.91 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    CIDEr· 2023-08-24
    201.47
    best: 510.44 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    Detection· 2023-08-24
    1.05
    best: 29.09 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    METEOR· 2023-08-24
    23.4
    best: 50.55 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    ROUGE-L· 2023-08-24
    34.52
    best: 79.41 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Visual Question AnsweringonSME
    SPICE· 2023-08-24
    26.13
    best: 64.09 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966

Computer Vision8 results

  • Explanatory Visual Question AnsweringonSME
    #Learning Samples (N)· 2023-08-24
    16
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Explanatory Visual Question AnsweringonSME
    ACC· 2023-08-24
    40.33
    best: 51.45 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Explanatory Visual Question AnsweringonSME
    BLEU-4· 2023-08-24
    24.3
    best: 67.91 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Explanatory Visual Question AnsweringonSME
    CIDEr· 2023-08-24
    201.47
    best: 510.44 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Explanatory Visual Question AnsweringonSME
    Detection· 2023-08-24
    1.05
    best: 29.09 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Explanatory Visual Question AnsweringonSME
    METEOR· 2023-08-24
    23.4
    best: 50.55 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Explanatory Visual Question AnsweringonSME
    ROUGE-L· 2023-08-24
    34.52
    best: 79.41 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966
  • Explanatory Visual Question AnsweringonSME
    SPICE· 2023-08-24
    26.13
    best: 64.09 (MEAgent)
    Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondarXiv:2308.12966

Reasoning1 result

  • Multimodal ReasoningonMATH-V
    Accuracy· uses extra data· 2024-02-22
    15.59
    best: 22.76 (GPT4V)
    Measuring Multimodal Mathematical Reasoning with MATH-Vision DatasetarXiv:2402.14804