Dataset for Post-OCR text correction in Sanskrit
ImagesTextshttps://github.com/ayushbits/pe-ocr-sanskritIntroduced 2022-11-15
This dataset contains around 218K sentences, with 1.5 million words, from 30 different books designed for Post-OCR text correction.
Source: A Benchmark and Dataset for Post-OCR text correction in Sanskrit
Image Source: https://arxiv.org/pdf/2211.07980v1.pdf