TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Cross-Modal Fusion and Attention Mechanism for Weakly Supe...

Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection

Ayush Ghadiya, Purbayan Kar, Vishal Chudasama, Pankaj Wasnik

2024-12-29CVPR 2024 6Weakly-supervised Video Anomaly DetectionVideo Anomaly DetectionAnomaly DetectionGraph Attention
PaperPDF

Abstract

Recently, weakly supervised video anomaly detection (WS-VAD) has emerged as a contemporary research direction to identify anomaly events like violence and nudity in videos using only video-level labels. However, this task has substantial challenges, including addressing imbalanced modality information and consistently distinguishing between normal and abnormal features. In this paper, we address these challenges and propose a multi-modal WS-VAD framework to accurately detect anomalies such as violence and nudity. Within the proposed framework, we introduce a new fusion mechanism known as the Cross-modal Fusion Adapter (CFA), which dynamically selects and enhances highly relevant audio-visual features in relation to the visual modality. Additionally, we introduce a Hyperbolic Lorentzian Graph Attention (HLGAtt) to effectively capture the hierarchical relationships between normal and abnormal representations, thereby enhancing feature separation accuracy. Through extensive experiments, we demonstrate that the proposed model achieves state-of-the-art results on benchmark datasets of violence and nudity detection.

Results

TaskDatasetMetricValueModel
Video UnderstandingXD-ViolenceAP86.34CFA-HLGAtt
VideoXD-ViolenceAP86.34CFA-HLGAtt
Anomaly DetectionXD-ViolenceAP86.34CFA-HLGAtt

Related Papers

Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems2025-07-213DKeyAD: High-Resolution 3D Point Cloud Anomaly Detection via Keypoint-Guided Point Clustering2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17A Privacy-Preserving Framework for Advertising Personalization Incorporating Federated Learning and Differential Privacy2025-07-16Catching Bid-rigging Cartels with Graph Attention Neural Networks2025-07-16Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection2025-07-15Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting2025-07-14Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers2025-07-12