TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BTS: Bridging Text and Sound Modalities for Metadata-Aided...

BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification

June-Woo Kim, Miika Toikkanen, Yera Choi, Seoung-Eun Moon, Ho-Young Jung

2024-06-10Sound ClassificationAudio Classification
PaperPDFCode(official)

Abstract

Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance in the case where metadata is partially unavailable, which may occur in real-world clinical setting.

Results

TaskDatasetMetricValueModel
Audio ClassificationICBHI Respiratory Sound DatabaseICBHI Score63.54BTS
Audio ClassificationICBHI Respiratory Sound DatabaseSensitivity45.67BTS
Audio ClassificationICBHI Respiratory Sound DatabaseSpecificity81.4BTS
Audio ClassificationICBHI Respiratory Sound DatabaseICBHI Score62.56Audio-CLAP
Audio ClassificationICBHI Respiratory Sound DatabaseSensitivity44.67Audio-CLAP
Audio ClassificationICBHI Respiratory Sound DatabaseSpecificity80.85Audio-CLAP
ClassificationICBHI Respiratory Sound DatabaseICBHI Score63.54BTS
ClassificationICBHI Respiratory Sound DatabaseSensitivity45.67BTS
ClassificationICBHI Respiratory Sound DatabaseSpecificity81.4BTS
ClassificationICBHI Respiratory Sound DatabaseICBHI Score62.56Audio-CLAP
ClassificationICBHI Respiratory Sound DatabaseSensitivity44.67Audio-CLAP
ClassificationICBHI Respiratory Sound DatabaseSpecificity80.85Audio-CLAP

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons2025-06-24USAD: Universal Speech and Audio Representation via Distillation2025-06-23Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier2025-06-23Acoustic scattering AI for non-invasive object classifications: A case study on hair assessment2025-06-17Disentangling Dual-Encoder Masked Autoencoder for Respiratory Sound Classification2025-06-12MUDAS: Mote-scale Unsupervised Domain Adaptation in Multi-label Sound Classification2025-06-12