TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Video-LLaVA

Video-LLaVA

Reported on 9 benchmarks across 5 tasks · 2 papers

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Reasoning6 results

  • Emotion InterpretationonEIBench (complex)
    Recall· 2025-04-10
    30.9
    best: 39.27 (ChatGPT-4o)
    Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language ModelsarXiv:2504.07521
  • Emotion InterpretationonEIBench
    Recall· 2025-04-10
    49.26
    best: 63.24 (Claude-3-haiku)
    Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language ModelsarXiv:2504.07521
  • Video Question AnsweringonActivityNet-QA
    Accuracy· 2023-11-16
    45.3
    best: 61.6 (Tarsier (34B))
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122
  • Video Question AnsweringonActivityNet-QA
    Confidence score· 2023-11-16
    3.3
    best: 2.2 (Video Chat)
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122
  • Video Question AnsweringonActivityNet-QA
    Accuracy· 2023-11-16
    45.3
    best: 61.6 (Tarsier (34B))
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122
  • Video Question AnsweringonActivityNet-QA
    Confidence Score· 2023-11-16
    3.3
    best: 1.1 (Video LLaMA)
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122

Natural Language Processing4 results

  • Question AnsweringonActivityNet-QA
    Accuracy· 2023-11-16
    45.3
    best: 61.6 (Tarsier (34B))
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122
  • Question AnsweringonActivityNet-QA
    Confidence Score· 2023-11-16
    3.3
    best: 1.1 (Video LLaMA)
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122
  • Visual Question Answering (VQA)onMM-Vet
    GPT-4 score· 2023-11-16
    32
    best: 74.24 (MMCTAgent (GPT-4 + GPT-4V))
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122
  • Visual Question AnsweringonMM-Vet
    GPT-4 score· 2023-11-16
    32
    best: 74.24 (MMCTAgent (GPT-4 + GPT-4V))
    Video-LLaVA: Learning United Visual Representation by Alignment Before ProjectionarXiv:2311.10122