Hierarchical Pre-training for Sequence Labelling in Spoken Dialog

Emile Chapuis, Pierre Colombo, Matteo Manica, Matthieu Labeau, Chloe Clavel

2020-09-23Findings of the Association for Computational Linguistics 2020Text Classification Emotion Recognition in Conversation Dialogue Act Classification

Paper PDF

Abstract

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification are a key component of spoken dialog systems. In this work, we propose a new approach to learn generic representations adapted to spoken dialog, which we evaluate on a new benchmark we call Sequence labellIng evaLuatIon benChmark fOr spoken laNguagE benchmark (\texttt{SILICONE}). \texttt{SILICONE} is model-agnostic and contains 10 different datasets of various sizes. We obtain our representations with a hierarchical encoder based on transformer architectures, for which we extend two well-known pre-training objectives. Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog containing over $2.3$ billion of tokens. We demonstrate how hierarchical encoders achieve competitive results with consistently fewer parameters compared to state-of-the-art models and we show their importance for both pre-training and fine-tuning.

Results

Task	Dataset	Metric	Value	Model
Dialogue	Switchboard corpus	Accuracy	79.2	Pretrained Hierarchical Transformer
Dialogue	ICSI Meeting Recorder Dialog Act (MRDA) corpus	Accuracy	92.4	Pretrained Hierarchical Transformer
Emotion Recognition	SEMAINE	MAE (Arousal)	0.16	Pretrained Hierarchical Transformer
Emotion Recognition	SEMAINE	MAE (Expectancy)	0.16	Pretrained Hierarchical Transformer
Emotion Recognition	SEMAINE	MAE (Power)	7.7	Pretrained Hierarchical Transformer
Emotion Recognition	SEMAINE	MAE (Valence)	0.16	Pretrained Hierarchical Transformer
Emotion Recognition	MELD	Weighted-F1	61.9	Pretrained Hierarchical Transformer
Emotion Recognition	DailyDialog	Micro-F1	60.14	Pretrained Hierarchical Transformer
Emotion Recognition	IEMOCAP	Accuracy	66.05	Pretrained Hierarchical Transformer
Emotion Recognition	IEMOCAP	Weighted-F1	65.37	Pretrained Hierarchical Transformer
Text Classification	SILICONE Benchmark	1:1 Accuracy	71.25	Pretrained Hierarchical Transformer
Classification	SILICONE Benchmark	1:1 Accuracy	71.25	Pretrained Hierarchical Transformer

Hierarchical Pre-training for Sequence Labelling in Spoken Dialog

Abstract

Results

Related Papers

Hierarchical Pre-training for Sequence Labelling in Spoken Dialog

Abstract

Results

Related Papers