TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Handwritten Text Recognition from Crowdsourced Annotations

Handwritten Text Recognition from Crowdsourced Annotations

Solène Tarride, Tristan Faine, Mélodie Boillet, Harold Mouchère, Christopher Kermorvant

2023-06-19International Workshop on Historical Document Imaging and Processing 2023 6Handwritten Text Recognition
PaperPDF

Abstract

In this paper, we explore different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available. We consider various training configurations, such as selecting a single transcription, retaining all transcriptions, or computing an aggregated transcription from all available annotations. In addition, we evaluate the impact of quality-based data selection, where samples with low agreement are removed from the training set. Our experiments are carried out on municipal registers of the city of Belfort (France) written between 1790 and 1946. % results The results show that computing a consensus transcription or training on multiple transcriptions are good alternatives. However, selecting training samples based on the degree of agreement between annotators introduces a bias in the training data and does not improve the results. Our dataset is publicly available on Zenodo: https://zenodo.org/record/8041668.

Results

TaskDatasetMetricValueModel
Optical Character Recognition (OCR)BelfortCER (%)4.34PyLaia (all transcriptions + agreement-based split)
Optical Character Recognition (OCR)BelfortWER (%)15.14PyLaia (all transcriptions + agreement-based split)
Optical Character Recognition (OCR)BelfortCER (%)4.95PyLaia (rover consensus + agreement-based split)
Optical Character Recognition (OCR)BelfortWER (%)17.08PyLaia (rover consensus + agreement-based split)
Optical Character Recognition (OCR)BelfortCER (%)5.57PyLaia (human transcriptions + agreement-based split)
Optical Character Recognition (OCR)BelfortWER (%)19.12PyLaia (human transcriptions + agreement-based split)
Optical Character Recognition (OCR)BelfortCER (%)10.54PyLaia (human transcriptions + random split)
Optical Character Recognition (OCR)BelfortWER (%)28.11PyLaia (human transcriptions + random split)
Handwritten Text RecognitionBelfortCER (%)4.34PyLaia (all transcriptions + agreement-based split)
Handwritten Text RecognitionBelfortWER (%)15.14PyLaia (all transcriptions + agreement-based split)
Handwritten Text RecognitionBelfortCER (%)4.95PyLaia (rover consensus + agreement-based split)
Handwritten Text RecognitionBelfortWER (%)17.08PyLaia (rover consensus + agreement-based split)
Handwritten Text RecognitionBelfortCER (%)5.57PyLaia (human transcriptions + agreement-based split)
Handwritten Text RecognitionBelfortWER (%)19.12PyLaia (human transcriptions + agreement-based split)
Handwritten Text RecognitionBelfortCER (%)10.54PyLaia (human transcriptions + random split)
Handwritten Text RecognitionBelfortWER (%)28.11PyLaia (human transcriptions + random split)

Related Papers

Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques2025-07-08Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition2025-06-11MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning2025-05-26Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition2025-04-11Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition2025-04-04TRIDIS: A Comprehensive Medieval and Early Modern Corpus for HTR and NER2025-03-25Benchmarking Large Language Models for Handwritten Text Recognition2025-03-19Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription2025-02-27