TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BAM-DETR: Boundary-Aligned Moment Detection Transformer fo...

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos

Pilhyeon Lee, Hyeran Byun

2023-11-30Temporal Sentence GroundingMoment RetrievalNatural Language Moment Retrieval
PaperPDFCode(official)

Abstract

Temporal sentence grounding aims to localize moments relevant to a language description. Recently, DETR-like approaches achieved notable progress by predicting the center and length of a target moment. However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions. To remedy this problem, we propose a novel boundary-oriented moment formulation. In our paradigm, the model no longer needs to find the precise center but instead suffices to predict any anchor point within the interval, from which the boundaries are directly estimated. Based on this idea, we design a boundary-aligned moment detection transformer, equipped with a dual-pathway decoding process. Specifically, it refines the anchor and boundaries within parallel pathways using global and boundary-focused attention, respectively. This separate design allows the model to focus on desirable regions, enabling precise refinement of moment predictions. Further, we propose a quality-based ranking method, ensuring that proposals with high localization qualities are prioritized over incomplete ones. Experiments on three benchmarks validate the effectiveness of the proposed methods. The code is available at https://github.com/Pilhyeon/BAM-DETR.

Results

TaskDatasetMetricValueModel
VideoTACoSR@1,IoU=0.356.69BAM-DETR
VideoTACoSR@1,IoU=0.541.54BAM-DETR
VideoTACoSR@1,IoU=0.726.77BAM-DETR
VideoTACoSmIoU39.31BAM-DETR
Moment RetrievalCharades-STAR@1 IoU=0.559.95BAM-DETR
Moment RetrievalCharades-STAR@1 IoU=0.739.38BAM-DETR
Moment RetrievalQVHighlightsR@1 IoU=0.564.07BAM-DETR (w/ audio)
Moment RetrievalQVHighlightsR@1 IoU=0.748.12BAM-DETR (w/ audio)
Moment RetrievalQVHighlightsmAP46.91BAM-DETR (w/ audio)
Moment RetrievalQVHighlightsmAP@0.565.61BAM-DETR (w/ audio)
Moment RetrievalQVHighlightsmAP@0.7547.51BAM-DETR (w/ audio)
Moment RetrievalQVHighlightsR@1 IoU=0.563.88BAM-DETR (w/ PT ASR Captions)
Moment RetrievalQVHighlightsR@1 IoU=0.747.92BAM-DETR (w/ PT ASR Captions)
Moment RetrievalQVHighlightsmAP46.67BAM-DETR (w/ PT ASR Captions)
Moment RetrievalQVHighlightsmAP@0.566.33BAM-DETR (w/ PT ASR Captions)
Moment RetrievalQVHighlightsmAP@0.7548.22BAM-DETR (w/ PT ASR Captions)
Moment RetrievalQVHighlightsR@1 IoU=0.562.71BAM-DETR
Moment RetrievalQVHighlightsR@1 IoU=0.748.64BAM-DETR
Moment RetrievalQVHighlightsmAP45.36BAM-DETR
Moment RetrievalQVHighlightsmAP@0.564.57BAM-DETR
Moment RetrievalQVHighlightsmAP@0.7546.33BAM-DETR

Related Papers

DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding2025-06-16DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos2025-05-22Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining2025-05-10Retrieval Augmented Generation Evaluation for Health Documents2025-05-07Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection2025-04-20Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking2025-04-11TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos2025-03-09Contrast-Unity for Partially-Supervised Temporal Sentence Grounding2025-02-18