TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ATST: Audio Representation Learning with Teacher-Student T...

ATST: Audio Representation Learning with Teacher-Student Transformer

Xian Li, Xiaofei Li

2022-04-26Speaker IdentificationRepresentation LearningAudio ClassificationSelf-Supervised LearningSelf-Supervised Audio ClassificationInstrument Recognition
PaperPDFCodeCodeCode(official)Code

Abstract

Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data. SSL has achieved promising results in various domains. This work addresses the problem of segment-level general audio SSL, and proposes a new transformer-based teacher-student SSL model, named ATST. A transformer encoder is developed on a recently emerged teacher-student baseline scheme, which largely improves the modeling capability of pre-training. In addition, a new strategy for positive pair creation is designed to fully leverage the capability of transformer. Extensive experiments have been conducted, and the proposed model achieves the new state-of-the-art results on almost all of the downstream tasks.

Results

TaskDatasetMetricValueModel
Speaker IdentificationVoxCeleb1Accuracy94.3ATST Base (ours)
Speaker IdentificationVoxCeleb1Top-1 (%)94.3ATST Base (ours)
Audio ClassificationBalanced Audio SetMean AP37.4Base (ours)
ClassificationBalanced Audio SetMean AP37.4Base (ours)
Spoken Command RecognitionSpeech Command v2Accuracy98Base (ours)

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16