Kostiantyn Omelianchuk, Andrii Liubonko, Oleksandr Skurzhanskyi, Artem Chernodub, Oleksandr Korniienko, Igor Samokhin
In this paper, we carry out experimental research on Grammatical Error Correction, delving into the nuances of single-model systems, comparing the efficiency of ensembling and ranking methods, and exploring the application of large language models to GEC as single-model systems, as parts of ensembles, and as ranking methods. We set new state-of-the-art performance with F_0.5 scores of 72.8 on CoNLL-2014-test and 81.4 on BEA-test, respectively. To support further advancements in GEC and ensure the reproducibility of our research, we make our code, trained models, and systems' outputs publicly available.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Grammatical Error Correction | CoNLL-2014 Shared Task | F0.5 | 72.8 | Ensembles of best 7 models + GRECO + GTP-rerank |
| Grammatical Error Correction | CoNLL-2014 Shared Task | Precision | 83.9 | Ensembles of best 7 models + GRECO + GTP-rerank |
| Grammatical Error Correction | CoNLL-2014 Shared Task | Recall | 47.5 | Ensembles of best 7 models + GRECO + GTP-rerank |
| Grammatical Error Correction | CoNLL-2014 Shared Task | F0.5 | 71.8 | Majority-voting ensemble on best 7 models |
| Grammatical Error Correction | CoNLL-2014 Shared Task | Precision | 83.7 | Majority-voting ensemble on best 7 models |
| Grammatical Error Correction | CoNLL-2014 Shared Task | Recall | 45.7 | Majority-voting ensemble on best 7 models |
| Grammatical Error Correction | BEA-2019 (test) | F0.5 | 81.4 | Majority-voting ensemble on best 7 models |