LLaVA-NeXT-Video-7B

Reported on 8 benchmarks across 4 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing7 results

Question AnsweringonVNBench
Accuracy· 2024-07-10
20.1
best: 77.88 (BIMBA-LLaVA-Qwen2-7B)
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models arXiv:2407.07895
Relation ExtractiononVinoground
Group Score
6.2
best: 35 (GPT-4o (CoT))
Relation ExtractiononVinoground
Text Score
21.8
best: 59.2 (GPT-4o (CoT))
Relation ExtractiononVinoground
Video Score
25.6
best: 51 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Group Score
6.2
best: 35 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Text Score
21.8
best: 59.2 (GPT-4o (CoT))
Temporal Relation ExtractiononVinoground
Video Score
25.6
best: 51 (GPT-4o (CoT))