Video-RAG (Based on LLaVA-Video)

Reported on 4 benchmarks across 2 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing2 results

Question AnsweringonVideo-MME
Accuracy (%)· 2024-11-20
77.4
best: 81.3 (Gemini 1.5 Pro)
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension arXiv:2411.13093
Question AnsweringonEgoSchema (fullset)
Accuracy· 2024-11-20
66.7
best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension arXiv:2411.13093

Reasoning2 results

Video Question AnsweringonVideo-MME
Accuracy (%)· 2024-11-20
77.4
best: 81.3 (Gemini 1.5 Pro)
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension arXiv:2411.13093
Video Question AnsweringonEgoSchema (fullset)
Accuracy· 2024-11-20
66.7
best: 71.14 (BIMBA-LLaVA-Qwen2-7B)
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension arXiv:2411.13093