TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multilingual Constituency Parsing with Self-Attention and ...

Multilingual Constituency Parsing with Self-Attention and Pre-Training

Nikita Kitaev, Steven Cao, Dan Klein

2018-12-31ACL 2019 7Unsupervised Pre-trainingConstituency Parsing
PaperPDFCodeCode(official)CodeCode

Abstract

We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-of-the-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).

Results

TaskDatasetMetricValueModel
Constituency ParsingCTB5F1 score91.75Kitaev etal. 2019

Related Papers

Automatic Extraction of Clausal Embedding Based on Large-Scale English Text Data2025-06-16Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models2025-06-05SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model2025-06-02Foundation Model for Wireless Technology Recognition Using IQ Timeseries2025-05-26Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation2025-05-12The Efficiency of Pre-training with Objective Masking in Pseudo Labeling for Semi-Supervised Text Classification2025-05-10Latte: Transfering LLMs` Latent-level Knowledge for Few-shot Tabular Learning2025-05-08Risk Assessment Framework for Code LLMs via Leveraging Internal States2025-04-20