TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Masked Latent Prediction and Classification for Self-Super...

Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning

Aurian Quelennec, Pierre Chouteau, Geoffroy Peeters, Slim Essid

2025-02-17ICASSP 2025 3Environmental Sound ClassificationTAGRepresentation LearningAudio ClassificationSelf-Supervised LearningMusic Genre ClassificationSelf-Supervised Audio ClassificationAudio TaggingMusic Auto-TaggingPredictionClassificationMusic TaggingInstrument Recognition
PaperPDFCode

Abstract

Recently, self-supervised learning methods based on masked latent prediction have proven to encode input data into powerful representations. However, during training, the learned latent space can be further transformed to extract higher-level information that could be more suited for downstream classification tasks. Therefore, we propose a new method: MAsked latenT Prediction And Classification (MATPAC), which is trained with two pretext tasks solved jointly. As in previous work, the first pretext task is a masked latent prediction task, ensuring a robust input representation in the latent space. The second one is unsupervised classification, which utilises the latent representations of the first pretext task to match probability distributions between a teacher and a student. We validate the MATPAC method by comparing it to other state-of-the-art proposals and conducting ablations studies. MATPAC reaches state-of-the-art self-supervised learning results on reference audio classification datasets such as OpenMIC, GTZAN, ESC-50 and US8K and outperforms comparable supervised methods results for musical auto-tagging on Magna-tag-a-tune.

Results

TaskDatasetMetricValueModel
Music Auto-TaggingMagnaTagATunePR-AUC41.1MATPAC (SSL, linear eval)
Music Auto-TaggingMagnaTagATuneROC AUC91.6MATPAC (SSL, linear eval)
Audio ClassificationESC-50Accuracy (5-fold)93.5MATPAC (SSL model, linear eval)
Audio ClassificationESC-50Top-1 Accuracy93.5MATPAC (SSL model, linear eval)
Audio ClassificationFSD50KmAP55.2MATPAC (SSL Model)
Audio ClassificationUrbanSound8KAccuracy89.4MATPAC (SSL, linear eval)
Environmental Sound ClassificationUrbanSound8KAccuracy89.4MATPAC (SSL, linear eval)
ClassificationESC-50Accuracy (5-fold)93.5MATPAC (SSL model, linear eval)
ClassificationESC-50Top-1 Accuracy93.5MATPAC (SSL model, linear eval)
ClassificationFSD50KmAP55.2MATPAC (SSL Model)
ClassificationUrbanSound8KAccuracy89.4MATPAC (SSL, linear eval)
Instrument RecognitionOpenMIC-2018mean average precision0.854MATPAC (SSL Model, linear eval)
Instrument RecognitionNSynthAccuracy74.6MATPAC (SSL, linear eval)

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17