TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HTR-VT: Handwritten Text Recognition with Vision Transformer

HTR-VT: Handwritten Text Recognition with Vision Transformer

Yuting Li, Dexiong Chen, Tinglong Tang, Xi Shen

2024-09-13Handwritten Text RecognitionHTR
PaperPDFCodeCode(official)

Abstract

We explore the application of Vision Transformer (ViT) for handwritten text recognition. The limited availability of labeled data in this domain poses challenges for achieving high performance solely relying on ViT. Previous transformer-based models required external data or extensive pre-training on large datasets to excel. To address this limitation, we introduce a data-efficient ViT method that uses only the encoder of the standard transformer. We find that incorporating a Convolutional Neural Network (CNN) for feature extraction instead of the original patch embedding and employ Sharpness-Aware Minimization (SAM) optimizer to ensure that the model can converge towards flatter minima and yield notable enhancements. Furthermore, our introduction of the span mask technique, which masks interconnected features in the feature map, acts as an effective regularizer. Empirically, our approach competes favorably with traditional CNN-based models on small datasets like IAM and READ2016. Additionally, it establishes a new benchmark on the LAM dataset, currently the largest dataset with 19,830 training text lines. The code is publicly available at: https://github.com/YutingLi0606/HTR-VT.

Results

TaskDatasetMetricValueModel
Optical Character Recognition (OCR)READ 2016CER (%)3.9HTR-VT(line-level)
Optical Character Recognition (OCR)READ 2016WER (%)16.5HTR-VT(line-level)
Optical Character Recognition (OCR)IAM(line-level)Test CER4.7HTR-VT
Optical Character Recognition (OCR)IAM(line-level)Test WER14.9HTR-VT
Optical Character Recognition (OCR)IAMCER4.7HTR-VT(line-level)
Optical Character Recognition (OCR)IAMWER14.9HTR-VT(line-level)
Optical Character Recognition (OCR)READ2016(line-level)Test CER3.9HTR-VT
Optical Character Recognition (OCR)READ2016(line-level)Test WER16.5HTR-VT
Optical Character Recognition (OCR)LAM(line-level)Test CER2.8HTR-VT
Optical Character Recognition (OCR)LAM(line-level)Test WER7.4HTR-VT
Handwritten Text RecognitionREAD 2016CER (%)3.9HTR-VT(line-level)
Handwritten Text RecognitionREAD 2016WER (%)16.5HTR-VT(line-level)
Handwritten Text RecognitionIAM(line-level)Test CER4.7HTR-VT
Handwritten Text RecognitionIAM(line-level)Test WER14.9HTR-VT
Handwritten Text RecognitionIAMCER4.7HTR-VT(line-level)
Handwritten Text RecognitionIAMWER14.9HTR-VT(line-level)
Handwritten Text RecognitionREAD2016(line-level)Test CER3.9HTR-VT
Handwritten Text RecognitionREAD2016(line-level)Test WER16.5HTR-VT
Handwritten Text RecognitionLAM(line-level)Test CER2.8HTR-VT
Handwritten Text RecognitionLAM(line-level)Test WER7.4HTR-VT

Related Papers

Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques2025-07-08Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition2025-06-11MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning2025-05-26Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition2025-04-11Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition2025-04-04TRIDIS: A Comprehensive Medieval and Early Modern Corpus for HTR and NER2025-03-25Benchmarking Large Language Models for Handwritten Text Recognition2025-03-19Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription2025-02-27