Cleaned_Lang8
TextsIntroduced 2024-01-30
Lang-8 Preprocessed Dataset (for GED):
- Dataset: Lang-8, a publicly available dataset containing user-generated content, primarily from second-language learners, focused on writing errors.
- Task: Grammatical Error Detection (GED).
- Size: 200,000 sentences, with each sentence labeled as '0' for incorrect and '1' for its corrected version.
- Preprocessing: The dataset has been cleaned and transformed to improve model performance, including removing noise, handling inconsistent annotations, and preparing it for training.
- Usage: Ideal for training and evaluating models for GED and GEC.