TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Span-based Localizing Network for Natural Language Video L...

Span-based Localizing Network for Natural Language Video Localization

Hao Zhang, Aixin Sun, Wei Jing, Joey Tianyi Zhou

2020-04-29ACL 2020 6Temporal Sentence Grounding
PaperPDFCode(official)

Abstract

Given an untrimmed video and a text query, natural language video localization (NLVL) is to locate a matching span from the video that semantically corresponds to the query. Existing solutions formulate NLVL either as a ranking task and apply multimodal matching architecture, or as a regression task to directly regress the target video span. In this work, we address NLVL task with a span-based QA approach by treating the input video as text passage. We propose a video span localizing network (VSLNet), on top of the standard span-based QA framework, to address NLVL. The proposed VSLNet tackles the differences between NLVL and span-based QA through a simple yet effective query-guided highlighting (QGH) strategy. The QGH guides VSLNet to search for matching video span within a highlighted region. Through extensive experiments on three benchmark datasets, we show that the proposed VSLNet outperforms the state-of-the-art methods; and adopting span-based QA framework is a promising direction to solve NLVL.

Results

TaskDatasetMetricValueModel
Video UnderstandingEgo4D-GoalstepR@1,IoU=0.311.7VSLNet
VideoEgo4D-GoalstepR@1,IoU=0.311.7VSLNet
Temporal Sentence GroundingEgo4D-GoalstepR@1,IoU=0.311.7VSLNet

Related Papers

DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos2025-05-22Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining2025-05-10Contrast-Unity for Partially-Supervised Temporal Sentence Grounding2025-02-18Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding2025-01-12Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network2024-12-20Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models2024-10-04Transformer with Controlled Attention for Synchronous Motion Captioning2024-09-13Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding2024-05-31