MCSCSet

MedicalTextsApache-2.0 licenseIntroduced 2022-10-21

MCSCSet is a large-scale specialist-annotated dataset, designed for the task of Medical-domain Chinese Spelling Correction that contains about 200k samples. MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists.

Source: MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction

Image Source: https://arxiv.org/pdf/2210.11720v1.pdf