TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Problem-agnostic Speech Representations from Mult...

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

Santiago Pascual, Mirco Ravanelli, Joan SerrĂ , Antonio Bonafonte, Yoshua Bengio

2019-04-06Distant Speech Recognition
PaperPDFCode(official)

Abstract

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems.

Results

TaskDatasetMetricValueModel
Speech RecognitionDIRHA English WSJWord Error Rate (WER)29.8PASE-FineTuned

Related Papers

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization2024-07-23Neural Blind Source Separation and Diarization for Distant Speech Recognition2024-06-12Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments2022-07-15Impact of Microphone position Measurement Error on Multi Channel Distant Speech Recognition & Intelligibility2021-12-01MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods2021-06-21Learning to Rank Microphones for Distant Speech Recognition2021-04-06Quaternion Neural Networks for Multi-channel Distant Speech Recognition2020-05-18Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition2019-10-15