StreamingBench

Introduced 2024-11-06

StreamingBench evaluates Multimodal Large Language Models (MLLMs) in real-time, streaming video understanding tasks. 🌟

šŸŽžļø Overview As MLLMs continue to advance, they remain largely focused on offline video comprehension, where all frames are pre-loaded before making queries. However, this is far from the human ability to process and respond to video streams in real-time, capturing the dynamic nature of multimedia content. To bridge this gap, StreamingBench introduces the first comprehensive benchmark for streaming video understanding in MLLMs.

Key Evaluation Aspects šŸŽÆ Real-time Visual Understanding: Can the model process and respond to visual changes in real-time? šŸ”Š Omni-source Understanding: Does the model integrate visual and audio inputs synchronously in real-time video streams? šŸŽ¬ Contextual Understanding: Can the model comprehend the broader context within video streams? Dataset Statistics šŸ“Š 900 diverse videos šŸ“ 4,500 human-annotated QA pairs ā±ļø Five questions per video at different timestamps