BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

June-Woo Kim, Miika Toikkanen, Yera Choi, Seoung-Eun Moon, Ho-Young Jung

2024-06-10Sound Classification Audio Classification

Abstract

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting.

Results

Task	Dataset	Metric	Value	Model
Audio Classification	ICBHI Respiratory Sound Database	ICBHI Score	63.54	BTS
Audio Classification	ICBHI Respiratory Sound Database	Sensitivity	45.67	BTS
Audio Classification	ICBHI Respiratory Sound Database	Specificity	81.4	BTS
Audio Classification	ICBHI Respiratory Sound Database	ICBHI Score	62.56	Audio-CLAP
Audio Classification	ICBHI Respiratory Sound Database	Sensitivity	44.67	Audio-CLAP
Audio Classification	ICBHI Respiratory Sound Database	Specificity	80.85	Audio-CLAP
Classification	ICBHI Respiratory Sound Database	ICBHI Score	63.54	BTS
Classification	ICBHI Respiratory Sound Database	Sensitivity	45.67	BTS
Classification	ICBHI Respiratory Sound Database	Specificity	81.4	BTS
Classification	ICBHI Respiratory Sound Database	ICBHI Score	62.56	Audio-CLAP
Classification	ICBHI Respiratory Sound Database	Sensitivity	44.67	Audio-CLAP
Classification	ICBHI Respiratory Sound Database	Specificity	80.85	Audio-CLAP

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

Abstract

Results

Related Papers

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

Abstract

Results

Related Papers