TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/KOHTD: Kazakh Offline Handwritten Text Dataset

KOHTD: Kazakh Offline Handwritten Text Dataset

Nazgul Toiganbayeva, Mahmoud Kasem, Galymzhan Abdimanap, Kairat Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Daniyar Nurseitov

2021-09-22Handwriting RecognitionHandwritten Text RecognitionHTR
PaperPDFCode(official)

Abstract

Despite the transition to digital information exchange, many documents, such as invoices, taxes, memos and questionnaires, historical data, and answers to exam questions, still require handwritten inputs. In this regard, there is a need to implement Handwritten Text Recognition (HTR) which is an automatic way to decrypt records using a computer. Handwriting recognition is challenging because of the virtually infinite number of ways a person can write the same message. For this proposal we introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary. This is particularly true given the lack of a dataset for handwritten Kazakh text. In this paper, we proposed our extensive Kazakh offline Handwritten Text dataset (KOHTD), which has 3000 handwritten exam papers and more than 140335 segmented images and there are approximately 922010 symbols. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning. We used a variety of popular text recognition methods for word and line recognition in our studies, including CTC-based and attention-based methods. The findings demonstrate KOHTD's diversity. Also, we proposed a Genetic Algorithm (GA) for line and word segmentation based on random enumeration of a parameter. The dataset and GA code are available at https://github.com/abdoelsayed2016/KOHTD.

Results

TaskDatasetMetricValueModel
Optical Character Recognition (OCR)KOHTDCER6.52Flor
Optical Character Recognition (OCR)KOHTDCER8.01Puigcerver
Optical Character Recognition (OCR)KOHTDCER8.22Abdallah
Optical Character Recognition (OCR)KOHTDCER8.36Bluche
Handwriting RecognitionKOHTDCER6.52Flor
Handwriting RecognitionKOHTDCER8.01Puigcerver
Handwriting RecognitionKOHTDCER8.22Abdallah
Handwriting RecognitionKOHTDCER8.36Bluche

Related Papers

Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques2025-07-08A Transformer Based Handwriting Recognition System Jointly Using Online and Offline Features2025-06-25Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition2025-06-11Creating a Historical Migration Dataset from Finnish Church Records, 1800-19202025-06-09MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning2025-05-26Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition2025-04-11Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition2025-04-04TRIDIS: A Comprehensive Medieval and Early Modern Corpus for HTR and NER2025-03-25