An Empirical Study of Incorporating Pseudo Data into Grammatical Error Correction
Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, Kentaro Inui
Abstract
The incorporation of pseudo data in the training of grammatical error correction models has been one of the main factors in improving the performance of such models. However, consensus is lacking on experimental configurations, namely, choosing how the pseudo data should be generated or used. In this study, these choices are investigated through extensive experiments, and state-of-the-art performance is achieved on the CoNLL-2014 test set ($F_{0.5}=65.0$) and the official test set of the BEA-2019 shared task ($F_{0.5}=70.2$) without making any modifications to the model architecture.
Results
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Grammatical Error Correction | CoNLL-2014 Shared Task | F0.5 | 65 | Transformer + Pre-train with Pseudo Data |
| Grammatical Error Correction | BEA-2019 (test) | F0.5 | 70.2 | Transformer + Pre-train with Pseudo Data |
Related Papers
End-to-End Spoken Grammatical Error Correction2025-06-23IMPARA-GED: Grammatical Error Detection is Boosting Reference-free Grammatical Error Quality Estimator2025-06-03Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction2025-05-27gec-metrics: A Unified Library for Grammatical Error Correction Evaluation2025-05-26Exploring the Feasibility of Multilingual Grammatical Error Correction with a Single LLM up to 9B parameters: A Comparative Study of 17 Models2025-05-09Enriching the Korean Learner Corpus with Multi-reference Annotations and Rubric-Based Scoring2025-05-01Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power Environments2025-03-31Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study2025-03-02