LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging

Shubhr Singh, Emmanouil Benetos, Huy Phan, Dan Stowell

2025-01-07Audio Classification

Abstract

Transformers have set new benchmarks in audio processing tasks, leveraging self-attention mechanisms to capture complex patterns and dependencies within audio data. However, their focus on pairwise interactions limits their ability to process the higher-order relations essential for identifying distinct audio objects. To address this limitation, this work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding by integrating local neighbourhood information with higher-order data from Fuzzy C-Means clusters, thereby capturing a broader spectrum of audio relationships. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks while operating with substantially fewer parameters. Moreover, LHGNN demonstrates a distinct advantage in scenarios lacking ImageNet pretraining, establishing its effectiveness and efficiency in environments where extensive pretraining data is unavailable.

Results

Task	Dataset	Metric	Value	Model
Audio Classification	ESC-50	Top-1 Accuracy	96.2	LHGNN
Audio Classification	Audio Set	Mean AP	46.6	LHGNN
Audio Classification	FSD50K	Mean AP	59	LHGNN
Classification	ESC-50	Top-1 Accuracy	96.2	LHGNN
Classification	Audio Set	Mean AP	46.6	LHGNN
Classification	FSD50K	Mean AP	59	LHGNN

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons2025-06-24 Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier2025-06-23 Adaptive Differential Denoising for Respiratory Sounds Classification2025-06-03 Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds2025-05-29 Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses2025-05-28 4,500 Seconds: Small Data Training Approaches for Deep UAV Audio Classification2025-05-21