TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mixer is more than just a model

Mixer is more than just a model

Qingfeng Ji, Yuxin Wang, Letong Sun

2024-02-28Environmental Sound ClassificationAudio ClassificationSpeech Emotion Recognition
PaperPDF

Abstract

Recently, MLP structures have regained popularity, with MLP-Mixer standing out as a prominent example. In the field of computer vision, MLP-Mixer is noted for its ability to extract data information from both channel and token perspectives, effectively acting as a fusion of channel and token information. Indeed, Mixer represents a paradigm for information extraction that amalgamates channel and token information. The essence of Mixer lies in its ability to blend information from diverse perspectives, epitomizing the true concept of "mixing" in the realm of neural network architectures. Beyond channel and token considerations, it is possible to create more tailored mixers from various perspectives to better suit specific task requirements. This study focuses on the domain of audio recognition, introducing a novel model named Audio Spectrogram Mixer with Roll-Time and Hermit FFT (ASM-RH) that incorporates insights from both time and frequency domains. Experimental results demonstrate that ASM-RH is particularly well-suited for audio data and yields promising outcomes across multiple classification tasks. The models and optimal weights files will be published.

Results

TaskDatasetMetricValueModel
Audio ClassificationSpeech CommandsAccuracy96.51ASM-RH
Audio ClassificationRAVDESSTop-1 Accuracy75.4ASM-RH-A
ClassificationSpeech CommandsAccuracy96.51ASM-RH
ClassificationRAVDESSTop-1 Accuracy75.4ASM-RH-A

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation2025-07-11Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons2025-06-24MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition2025-06-24Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier2025-06-23Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction2025-06-12MEDUSA: A Multimodal Deep Fusion Multi-Stage Training Framework for Speech Emotion Recognition in Naturalistic Conditions2025-06-11