CSCD-IME

TextsMIT LicenseIntroduced 2022-11-16

Chinese Spelling Correction Dataset for errors generated by pinyin IME (CSCD-IME), a dataset containing 40,000 annotated sentences from real posts of official media on Sina Weibo. It is designed to detect and correct spelling mistakes in Chinese texts.

Source: CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME

Image Source: https://github.com/nghuyong/cscd-ime