Multilingual Constituency Parsing with Self-Attention and Pre-Training

Nikita Kitaev, Steven Cao, Dan Klein

2018-12-31ACL 2019 7Unsupervised Pre-training Constituency Parsing

Abstract

We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-of-the-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).

Results

Task	Dataset	Metric	Value	Model
Constituency Parsing	CTB5	F1 score	91.75	Kitaev etal. 2019

Related Papers

Automatic Extraction of Clausal Embedding Based on Large-Scale English Text Data2025-06-16 Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models2025-06-05 SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model2025-06-02 Foundation Model for Wireless Technology Recognition Using IQ Timeseries2025-05-26 Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation2025-05-12 The Efficiency of Pre-training with Objective Masking in Pseudo Labeling for Semi-Supervised Text Classification2025-05-10 Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning2025-05-08 Risk Assessment Framework for Code LLMs via Leveraging Internal States2025-04-20