TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM
Ye Wang, Boshen Xu, Zihao Yue, Zihan Xiao, Ziheng Wang, Liang Zhang, Dingyi Yang, Wenxuan Wang, Qin Jin
2025-03-17Video Grounding
Abstract
We introduce TimeZero, a reasoning-guided LVLM designed for the temporal video grounding (TVG) task. This task requires precisely localizing relevant video segments within long videos based on a given language query. TimeZero tackles this challenge by extending the inference process, enabling the model to reason about video-language relationships solely through reinforcement learning. To evaluate the effectiveness of TimeZero, we conduct experiments on two benchmarks, where TimeZero achieves state-of-the-art performance on Charades-STA. Code is available at https://github.com/www-Ye/TimeZero.
Related Papers
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency2025-06-02SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models2025-05-24DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos2025-05-22Object-Shot Enhanced Grounding Network for Egocentric Video2025-05-07Enhancing Weakly Supervised Video Grounding via Diverse Inference Strategies for Boundary and Prediction Selection2025-03-29VideoGEM: Training-free Action Grounding in Videos2025-03-26SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability2025-03-18