TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Leveraging LLM and Text-Queried Separation for Noise-Robus...

Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection

Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das

2024-11-02Audio Source SeparationSound Event DetectionEvent Detection
PaperPDFCode(official)

Abstract

Sound Event Detection (SED) is challenging in noisy environments where overlapping sounds obscure target events. Language-queried audio source separation (LASS) aims to isolate the target sound events from a noisy clip. However, this approach can fail when the exact target sound is unknown, particularly in noisy test sets, leading to reduced performance. To address this issue, we leverage the capabilities of large language models (LLMs) to analyze and summarize acoustic data. By using LLMs to identify and select specific noise types, we implement a noise augmentation method for noise-robust fine-tuning. The fine-tuned model is applied to predict clip-wise event predictions as text queries for the LASS model. Our studies demonstrate that the proposed method improves SED performance in noisy environments. This work represents an early application of LLMs in noise-robust SED and suggests a promising direction for handling overlapping events in SED. Codes and pretrained models are available at https://github.com/apple-yinhan/Noise-robust-SED.

Results

TaskDatasetMetricValueModel
Sound Event DetectionWildDESEDPSDS1 (-5dB)0.134CRNN (with BEATs + Separation)
Sound Event DetectionWildDESEDPSDS1 (0dB)0.219CRNN (with BEATs + Separation)
Sound Event DetectionWildDESEDPSDS1 (10dB)0.356CRNN (with BEATs + Separation)
Sound Event DetectionWildDESEDPSDS1 (5dB)0.291CRNN (with BEATs + Separation)
Sound Event DetectionWildDESEDPSDS1 (Clean)0.44CRNN (with BEATs + Separation)
Sound Event DetectionWildDESEDPSDS1 (-5dB)0.065CRNN (with BEATs)
Sound Event DetectionWildDESEDPSDS1 (0dB)0.138CRNN (with BEATs)
Sound Event DetectionWildDESEDPSDS1 (10dB)0.329CRNN (with BEATs)
Sound Event DetectionWildDESEDPSDS1 (5dB)0.236CRNN (with BEATs)
Sound Event DetectionWildDESEDPSDS1 (Clean)0.5CRNN (with BEATs)

Related Papers

Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models2025-07-15Frequency Dynamic Convolutions for Sound Event Detection2025-06-15DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning2025-06-05DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization2025-06-03Towards real-time assessment of infrasound event detection capability using deep learning-based transmission loss estimation2025-06-03DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization2025-06-03ZeroSep: Separate Anything in Audio with Zero Training2025-05-29Text-Queried Audio Source Separation via Hierarchical Modeling2025-05-27