TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DASS: Distilled Audio State Space Models Are Stronger and ...

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Saurabhchand Bhati, Yuan Gong, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass

2024-07-04Audio ClassificationAudio TaggingKnowledge Distillation
PaperPDFCode(official)

Abstract

State-space models (SSMs) have emerged as an alternative to Transformers for audio modeling due to their high computational efficiency with long inputs. While recent efforts on Audio SSMs have reported encouraging results, two main limitations remain: First, in 10-second short audio tagging tasks, Audio SSMs still underperform compared to Transformer-based models such as Audio Spectrogram Transformer (AST). Second, although Audio SSMs theoretically support long audio inputs, their actual performance with long audio has not been thoroughly evaluated. To address these limitations, in this paper, 1) We applied knowledge distillation in audio space model training, resulting in a model called Knowledge Distilled Audio SSM (DASS). To the best of our knowledge, it is the first SSM that outperforms the Transformers on AudioSet and achieves an mAP of 48.9; and 2) We designed a new test called Audio Needle In A Haystack (Audio NIAH). We find that DASS, trained with only 10-second audio clips, can retrieve sound events in audio recordings up to 2.5 hours long, while the AST model fails when the input is just 50 seconds, demonstrating SSMs are indeed more duration scalable. Code: https://github.com/Saurabhbhati/DASS, https://huggingface.co/saurabhati/DASS_small_AudioSet_48.9

Results

TaskDatasetMetricValueModel
Audio ClassificationAudioSetTest mAP0.476DASS-Medium (Audio-only, single)
Audio ClassificationAudioSetTest mAP0.472DASS-Small (Audio-only, single)
ClassificationAudioSetTest mAP0.476DASS-Medium (Audio-only, single)
ClassificationAudioSetTest mAP0.472DASS-Small (Audio-only, single)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Uncertainty-Aware Cross-Modal Knowledge Distillation with Prototype Learning for Multimodal Brain-Computer Interfaces2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16HanjaBridge: Resolving Semantic Ambiguity in Korean LLMs via Hanja-Augmented Pre-Training2025-07-15Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning2025-07-14KAT-V1: Kwai-AutoThink Technical Report2025-07-11