An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction

Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, Kentaro Inui

2019-09-02IJCNLP 2019 11Grammatical Error Correction

Abstract

The incorporation of pseudo data in the training of grammatical error correction models has been one of the main factors in improving the performance of such models. However, consensus is lacking on experimental configurations, namely, choosing how the pseudo data should be generated or used. In this study, these choices are investigated through extensive experiments, and state-of-the-art performance is achieved on the CoNLL-2014 test set ($F_{0.5}=65.0$) and the official test set of the BEA-2019 shared task ($F_{0.5}=70.2$) without making any modifications to the model architecture.

Results

Task	Dataset	Metric	Value	Model
Grammatical Error Correction	CoNLL-2014 Shared Task	F0.5	65	Transformer + Pre-train with Pseudo Data
Grammatical Error Correction	BEA-2019 (test)	F0.5	70.2	Transformer + Pre-train with Pseudo Data

Related Papers

End-to-End Spoken Grammatical Error Correction2025-06-23 IMPARA-GED: Grammatical Error Detection is Boosting Reference-free Grammatical Error Quality Estimator2025-06-03 Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction2025-05-27 gec-metrics: A Unified Library for Grammatical Error Correction Evaluation2025-05-26 Exploring the Feasibility of Multilingual Grammatical Error Correction with a Single LLM up to 9B parameters: A Comparative Study of 17 Models2025-05-09 Enriching the Korean Learner Corpus with Multi-reference Annotations and Rubric-Based Scoring2025-05-01 Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power Environments2025-03-31 Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study2025-03-02