TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MAT-SED: A Masked Audio Transformer with Masked-Reconstruc...

MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

Pengfei Cai, Yan Song, Kang Li, Haoyu Song, Ian McLoughlin

2024-08-16Sound Event DetectionEvent Detection
PaperPDFCode(official)

Abstract

Sound event detection (SED) methods that leverage a large pre-trained Transformer encoder network have shown promising performance in recent DCASE challenges. However, they still rely on an RNN-based context network to model temporal dependencies, largely due to the scarcity of labeled data. In this work, we propose a pure Transformer-based SED model with masked-reconstruction based pre-training, termed MAT-SED. Specifically, a Transformer with relative positional encoding is first designed as the context network, pre-trained by the masked-reconstruction task on all available target data in a self-supervised way. Both the encoder and the context network are jointly fine-tuned in a semi-supervised manner. Furthermore, a global-local feature fusion strategy is proposed to enhance the localization capability. Evaluation of MAT-SED on DCASE2023 task4 surpasses state-of-the-art performance, achieving 0.587/0.896 PSDS1/PSDS2 respectively.

Results

TaskDatasetMetricValueModel
Sound Event DetectionDESEDPSDS10.587MAT-SED
Sound Event DetectionDESEDPSDS20.896MAT-SED

Related Papers

Frequency Dynamic Convolutions for Sound Event Detection2025-06-15DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning2025-06-05Towards real-time assessment of infrasound event detection capability using deep learning-based transmission loss estimation2025-06-03DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization2025-06-03Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection2025-05-27CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training2025-05-23Exploring the Potential of SSL Models for Sound Event Detection2025-05-17Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs2025-05-16