Lightweight Adapter Tuning for Multilingual Speech Translation

Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

2021-06-02ACL 2021 5Speech Recognition Machine Translation Automatic Speech Recognition Speech-to-Text Translation Automatic Speech Recognition (ASR)speech-recognition Translation

Paper PDF Code(official)Code(official)

Abstract

Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of task-specific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper proposes a comprehensive analysis of adapters for multilingual speech translation (ST). Starting from different pre-trained models (a multilingual ST trained on parallel data or a multilingual BART (mBART) trained on non-parallel multilingual data), we show that adapters can be used to: (a) efficiently specialize ST to specific language pairs with a low extra cost in terms of parameters, and (b) transfer from an automatic speech recognition (ASR) task and an mBART pre-trained model to a multilingual ST task. Experiments show that adapter tuning offer competitive results to full fine-tuning, while being much more parameter-efficient.

Results

Task	Dataset	Metric	Value	Model
Speech-to-Text Translation	MuST-C EN->DE	Case-sensitive sacreBLEU	24.63	Transformer with Adapters
Speech-to-Text Translation	MuST-C	SacreBLEU	26.61	Transformer with Adapters
Speech-to-Text Translation	MuST-C EN->ES	Case-sensitive sacreBLEU	28.73	Transformer with Adapters

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17 A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17 Function-to-Style Guidance of LLMs for Code Translation2025-07-15 WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14 Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09 Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09 VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08