TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Boundary-Denoising for Video Activity Localization

Boundary-Denoising for Video Activity Localization

Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem

2023-04-06Action DetectionDenoisingVideo GroundingMoment Retrieval
PaperPDFCode(official)

Abstract

Video activity localization aims at understanding the semantic content in long untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action detection, etc. Unfortunately, learning the exact boundary location of activities is highly challenging because temporal activities are continuous in time, and there are often no clear-cut transitions between actions. Moreover, the definition of the start and end of events is subjective, which may confuse the model. To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective. Specifically, we propose an encoder-decoder model named DenoiseLoc. During training, a set of action spans is randomly generated from the ground truth with a controlled noise scale. Then we attempt to reverse this process by boundary denoising, allowing the localizer to predict activities with precise boundaries and resulting in faster convergence speed. Experiments show that DenoiseLoc advances %in several video activity understanding tasks. For example, we observe a gain of +12.36% average mAP on QV-Highlights dataset and +1.64% mAP@0.5 on THUMOS'14 dataset over the baseline. Moreover, DenoiseLoc achieves state-of-the-art performance on TACoS and MAD datasets, but with much fewer predictions compared to other current methods.

Results

TaskDatasetMetricValueModel
VideoMADR@1,IoU=0.111.59DenoiseLoc
VideoMADR@10,IoU=0.141.44DenoiseLoc
VideoMADR@100,IoU=0.173.62DenoiseLoc
VideoMADR@5,IoU=0.130.35DenoiseLoc
VideoMADR@50,IoU=0.166.07DenoiseLoc
Video RetrievalMADR@1,IoU=0.111.59DenoiseLoc
Video RetrievalMADR@10,IoU=0.141.44DenoiseLoc
Video RetrievalMADR@100,IoU=0.173.62DenoiseLoc
Video RetrievalMADR@5,IoU=0.130.35DenoiseLoc
Video RetrievalMADR@50,IoU=0.166.07DenoiseLoc
Moment RetrievalQVHighlightsR@1 IoU=0.559.27DenoiseLoc
Moment RetrievalQVHighlightsR@1 IoU=0.745.07DenoiseLoc
Video GroundingMADR@1,IoU=0.111.59DenoiseLoc
Video GroundingMADR@10,IoU=0.141.44DenoiseLoc
Video GroundingMADR@100,IoU=0.173.62DenoiseLoc
Video GroundingMADR@5,IoU=0.130.35DenoiseLoc
Video GroundingMADR@50,IoU=0.166.07DenoiseLoc

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15A statistical physics framework for optimal learning2025-07-10LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models2025-07-08