TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Correlation-Guided Query-Dependency Calibration for Video ...

Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding

WonJun Moon, Sangeek Hyun, SuBeen Lee, Jae-Pil Heo

2023-11-15Representation LearningHighlight DetectionMoment RetrievalNatural Language Moment Retrieval
PaperPDFCodeCode(official)

Abstract

Temporal Grounding is to identify specific moments or highlights from a video corresponding to textual descriptions. Typical approaches in temporal grounding treat all video clips equally during the encoding process regardless of their semantic relevance with the text query. Therefore, we propose Correlation-Guided DEtection TRansformer (CG-DETR), exploring to provide clues for query-associated video clips within the cross-modal attention. First, we design an adaptive cross-attention with dummy tokens. Dummy tokens conditioned by text query take portions of the attention weights, preventing irrelevant video clips from being represented by the text query. Yet, not all words equally inherit the text query's correlation to video clips. Thus, we further guide the cross-attention map by inferring the fine-grained correlation between video clips and words. We enable this by learning a joint embedding space for high-level concepts, i.e., moment and sentence level, and inferring the clip-word correlation. Lastly, we exploit the moment-specific characteristics and combine them with the context of each video to form a moment-adaptive saliency detector. By exploiting the degrees of text engagement in each video clip, it precisely measures the highlightness of each clip. CG-DETR achieves state-of-the-art results on various benchmarks for temporal grounding. Codes are available at https://github.com/wjun0830/CGDETR.

Results

TaskDatasetMetricValueModel
VideoTACoSR@1,IoU=0.352.23CG-DETR
VideoTACoSR@1,IoU=0.539.61CG-DETR
VideoTACoSR@1,IoU=0.722.23CG-DETR
VideoTACoSmIoU36.48CG-DETR
Moment RetrievalCharades-STAR@1 IoU=0.558.44CG-DETR
Moment RetrievalCharades-STAR@1 IoU=0.736.34CG-DETR
Moment RetrievalQVHighlightsR@1 IoU=0.568.48CG-DETR (w/ PT)
Moment RetrievalQVHighlightsR@1 IoU=0.753.11CG-DETR (w/ PT)
Moment RetrievalQVHighlightsmAP47.97CG-DETR (w/ PT)
Moment RetrievalQVHighlightsmAP@0.569.4CG-DETR (w/ PT)
Moment RetrievalQVHighlightsmAP@0.7549.12CG-DETR (w/ PT)
Moment RetrievalQVHighlightsR@1 IoU=0.565.43CG-DETR
Moment RetrievalQVHighlightsR@1 IoU=0.748.38CG-DETR
Moment RetrievalQVHighlightsmAP42.86CG-DETR
Moment RetrievalQVHighlightsmAP@0.564.51CG-DETR
Moment RetrievalQVHighlightsmAP@0.7542.77CG-DETR
Highlight DetectionTvSummAP86.8CG-DETR
Highlight DetectionYouTube HighlightsmAP75.9CG-DETR
Highlight DetectionQVHighlightsHit@166.6CG-DETR (w/ PT)
Highlight DetectionQVHighlightsmAP40.71CG-DETR (w/ PT)
Highlight DetectionQVHighlightsHit@166.21CG-DETR
Highlight DetectionQVHighlightsmAP40.33CG-DETR
16kTvSummAP86.8CG-DETR
16kYouTube HighlightsmAP75.9CG-DETR
16kQVHighlightsHit@166.6CG-DETR (w/ PT)
16kQVHighlightsmAP40.71CG-DETR (w/ PT)
16kQVHighlightsHit@166.21CG-DETR
16kQVHighlightsmAP40.33CG-DETR

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction2025-07-15Dual Dimensions Geometric Representation Learning Based Document Dewarping2025-07-11