TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VRAG: Region Attention Graphs for Content-Based Video Retr...

VRAG: Region Attention Graphs for Content-Based Video Retrieval

Kennard Ng, Ser-Nam Lim, Gim Hee Lee

2022-05-18Video RetrievalRetrieval
PaperPDF

Abstract

Content-based Video Retrieval (CBVR) is used on media-sharing platforms for applications such as video recommendation and filtering. To manage databases that scale to billions of videos, video-level approaches that use fixed-size embeddings are preferred due to their efficiency. In this paper, we introduce Video Region Attention Graph Networks (VRAG) that improves the state-of-the-art of video-level methods. We represent videos at a finer granularity via region-level features and encode video spatio-temporal dynamics through region-level relations. Our VRAG captures the relationships between regions based on their semantic content via self-attention and the permutation invariant aggregation of Graph Convolution. In addition, we show that the performance gap between video-level and frame-level methods can be reduced by segmenting videos into shots and using shot embeddings for video retrieval. We evaluate our VRAG over several video retrieval tasks and achieve a new state-of-the-art for video-level retrieval. Furthermore, our shot-level VRAG shows higher retrieval precision than other existing video-level methods, and closer performance to frame-level methods at faster evaluation speeds. Finally, our code will be made publicly available.

Results

TaskDatasetMetricValueModel
VideoFIVR-200KmAP (CSVR)0.678VRAG (CS)
VideoFIVR-200KmAP (DSVR)0.723VRAG (CS)
VideoFIVR-200KmAP (ISVR)0.554VRAG (CS)
VideoFIVR-200KmAP (CSVR)0.47VRAG (video)
VideoFIVR-200KmAP (DSVR)0.484VRAG (video)
VideoFIVR-200KmAP (ISVR)0.399VRAG (video)
Video RetrievalFIVR-200KmAP (CSVR)0.678VRAG (CS)
Video RetrievalFIVR-200KmAP (DSVR)0.723VRAG (CS)
Video RetrievalFIVR-200KmAP (ISVR)0.554VRAG (CS)
Video RetrievalFIVR-200KmAP (CSVR)0.47VRAG (video)
Video RetrievalFIVR-200KmAP (DSVR)0.484VRAG (video)
Video RetrievalFIVR-200KmAP (ISVR)0.399VRAG (video)

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15