TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Simple Recipe for Multilingual Grammatical Error Correct...

A Simple Recipe for Multilingual Grammatical Error Correction

Sascha Rothe, Jonathan Mallinson, Eric Malmi, Sebastian Krause, Aliaksei Severyn

2021-06-07ACL 2021 5Grammatical Error Correction
PaperPDFCode(official)Code

Abstract

This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a cLang-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy lang-8 dataset. cLang-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages -- we demonstrate that performing a single fine-tuning step on cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.

Results

TaskDatasetMetricValueModel
Grammatical Error CorrectionCoNLL-2014 Shared TaskF0.568.87T5
Grammatical Error CorrectionFalko-MERLINF0.575.96gT5 xxl

Related Papers

End-to-End Spoken Grammatical Error Correction2025-06-23IMPARA-GED: Grammatical Error Detection is Boosting Reference-free Grammatical Error Quality Estimator2025-06-03Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction2025-05-27gec-metrics: A Unified Library for Grammatical Error Correction Evaluation2025-05-26Exploring the Feasibility of Multilingual Grammatical Error Correction with a Single LLM up to 9B parameters: A Comparative Study of 17 Models2025-05-09Enriching the Korean Learner Corpus with Multi-reference Annotations and Rubric-Based Scoring2025-05-01Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power Environments2025-03-31Enhancing Text Editing for Grammatical Error Correction: Arabic as a Case Study2025-03-02