TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/VideoChat2_mistral

VideoChat2_mistral

Reported on 9 benchmarks across 2 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning7 results

  • Video Question AnsweringonNExT-QA
    Accuracy· 2023-11-28
    78.6
    best: 85.5 (LinVT-Qwen2-VL (7B))
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005
  • Video Question AnsweringonIntentQA
    Accuarcy· 2023-11-28
    81.9
    best: 83.4 (VideoChat2_HD_mistral)
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005
  • Video Question AnsweringonIntentQA
    CH· 2023-11-28
    86.9
    best: 90 (VideoChat2_HD_mistral)
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005
  • Video Question AnsweringonIntentQA
    CW· 2023-11-28
    82.6
    best: 84 (VideoChat2_HD_mistral)
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005
  • Video Question AnsweringonIntentQA
    TP&TN· 2023-11-28
    77
    best: 79.1 (Human)
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005
  • Video Question AnsweringonEgoSchema (fullset)
    Accuracy· 2023-11-28
    54.4
    best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005
  • Video Question AnsweringonEgoSchema (subset)
    Accuracy· 2023-11-28
    63.6
    best: 68.6 (Tarsier (34B))
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005

Natural Language Processing2 results

  • Question AnsweringonEgoSchema (fullset)
    Accuracy· 2023-11-28
    54.4
    best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005
  • Question AnsweringonEgoSchema (subset)
    Accuracy· 2023-11-28
    63.6
    best: 68.6 (Tarsier (34B))
    MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkarXiv:2311.17005