Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection

Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das

2024-11-02Audio Source Separation Sound Event Detection Event Detection

Abstract

Sound Event Detection (SED) is challenging in noisy environments where overlapping sounds obscure target events. Language-queried audio source separation (LASS) aims to isolate the target sound events from a noisy clip. However, this approach can fail when the exact target sound is unknown, particularly in noisy test sets, leading to reduced performance. To address this issue, we leverage the capabilities of large language models (LLMs) to analyze and summarize acoustic data. By using LLMs to identify and select specific noise types, we implement a noise augmentation method for noise-robust fine-tuning. The fine-tuned model is applied to predict clip-wise event predictions as text queries for the LASS model. Our studies demonstrate that the proposed method improves SED performance in noisy environments. This work represents an early application of LLMs in noise-robust SED and suggests a promising direction for handling overlapping events in SED. Codes and pretrained models are available at https://github.com/apple-yinhan/Noise-robust-SED.

Results

Task	Dataset	Metric	Value	Model
Sound Event Detection	WildDESED	PSDS1 (-5dB)	0.134	CRNN (with BEATs + Separation)
Sound Event Detection	WildDESED	PSDS1 (0dB)	0.219	CRNN (with BEATs + Separation)
Sound Event Detection	WildDESED	PSDS1 (10dB)	0.356	CRNN (with BEATs + Separation)
Sound Event Detection	WildDESED	PSDS1 (5dB)	0.291	CRNN (with BEATs + Separation)
Sound Event Detection	WildDESED	PSDS1 (Clean)	0.44	CRNN (with BEATs + Separation)
Sound Event Detection	WildDESED	PSDS1 (-5dB)	0.065	CRNN (with BEATs)
Sound Event Detection	WildDESED	PSDS1 (0dB)	0.138	CRNN (with BEATs)
Sound Event Detection	WildDESED	PSDS1 (10dB)	0.329	CRNN (with BEATs)
Sound Event Detection	WildDESED	PSDS1 (5dB)	0.236	CRNN (with BEATs)
Sound Event Detection	WildDESED	PSDS1 (Clean)	0.5	CRNN (with BEATs)

Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection

Abstract

Results

Related Papers

Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection

Abstract

Results

Related Papers