MTFL: Multi-Timescale Feature Learning for Weakly-Supervised Anomaly Detection in Surveillance Videos

Yiling Zhang, Erkut Akdag, Egor Bondarev, Peter H. N. de With

2024-10-08Anomaly Detection In Surveillance Videos Video Anomaly Detection Anomaly Detection Supervised Anomaly Detection Weakly-supervised Anomaly Detection

Paper PDF Code(official)

Abstract

Detection of anomaly events is relevant for public safety and requires a combination of fine-grained motion information and contextual events at variable time-scales. To this end, we propose a Multi-Timescale Feature Learning (MTFL) method to enhance the representation of anomaly features. Short, medium, and long temporal tubelets are employed to extract spatio-temporal video features using a Video Swin Transformer. Experimental results demonstrate that MTFL outperforms state-of-the-art methods on the UCF-Crime dataset, achieving an anomaly detection performance 89.78% AUC. Moreover, it performs complementary to SotA with 95.32% AUC on the ShanghaiTech and 84.57% AP on the XD-Violence dataset. Furthermore, we generate an extended dataset of the UCF-Crime for development and evaluation on a wider range of anomalies, namely Video Anomaly Detection Dataset (VADD), involving 2,591 videos in 18 classes with extensive coverage of realistic anomalies.

Results

Task	Dataset	Metric	Value	Model
Video Understanding	VADD	ROC AUC	88.42	MTFL (VST, finetuned on VADD)
Video Understanding	ShanghaiTech Weakly Supervised	AUC-ROC	95.7	MTFL (VST, finetuned on VADD)
Video Understanding	ShanghaiTech Weakly Supervised	AUC-ROC	95.32	MTFL (VST)
Video Understanding	UCF-Crime	ROC AUC	89.78	MTFL (VST, finetuned on VADD)
Video Understanding	UCF-Crime	ROC AUC	87.16	MTFL (VST)
Video Understanding	XD-Violence	AP	84.57	MTFL (VST)
Video Understanding	XD-Violence	AP	79.4	MTFL (VST, finetuned on VADD)
Video	VADD	ROC AUC	88.42	MTFL (VST, finetuned on VADD)
Video	ShanghaiTech Weakly Supervised	AUC-ROC	95.7	MTFL (VST, finetuned on VADD)
Video	ShanghaiTech Weakly Supervised	AUC-ROC	95.32	MTFL (VST)
Video	UCF-Crime	ROC AUC	89.78	MTFL (VST, finetuned on VADD)
Video	UCF-Crime	ROC AUC	87.16	MTFL (VST)
Video	XD-Violence	AP	84.57	MTFL (VST)
Video	XD-Violence	AP	79.4	MTFL (VST, finetuned on VADD)
Anomaly Detection	VADD	ROC AUC	88.42	MTFL (VST, finetuned on VADD)
Anomaly Detection	ShanghaiTech Weakly Supervised	AUC-ROC	95.7	MTFL (VST, finetuned on VADD)
Anomaly Detection	ShanghaiTech Weakly Supervised	AUC-ROC	95.32	MTFL (VST)
Anomaly Detection	UCF-Crime	ROC AUC	89.78	MTFL (VST, finetuned on VADD)
Anomaly Detection	UCF-Crime	ROC AUC	87.16	MTFL (VST)
Anomaly Detection	XD-Violence	AP	84.57	MTFL (VST)
Anomaly Detection	XD-Violence	AP	79.4	MTFL (VST, finetuned on VADD)

MTFL: Multi-Timescale Feature Learning for Weakly-Supervised Anomaly Detection in Surveillance Videos

Abstract

Results

Related Papers

MTFL: Multi-Timescale Feature Learning for Weakly-Supervised Anomaly Detection in Surveillance Videos

Abstract

Results

Related Papers