Papers With Code 2 | ML Benchmarks, SotA Results & Code

OVBench is a benchmark tailored for real-time video understanding:

Memory, Perception, and Prediction of Temporal Contexts: Questions are framed to reference the present state of entities, requiring models to memorize/perceive/predict past/present/future temporal contexts over time.
Dynamic Spatio-temporal Interaction: The benchmark demands precise real-time interactions with video content, where actions, objects, and events must be understood in the context of their spatial and temporal relationships.
Contextual Awareness at Specific Moments: Real-time questions are contextual, changing based on the specific timestamp they are asked, requiring a deep understanding of how temporal context evolves.