StackMix and Blot Augmentations for Handwritten Text Recognition

Alex Shonenkov, Denis Karachev, Maxim Novopoltsev, Mark Potanin, Denis Dimitrov

2021-08-26Text Generation Handwritten Text Recognition Data Augmentation HTR

Abstract

This paper proposes a handwritten text recognition(HTR) system that outperforms current state-of-the-artmethods. The comparison was carried out on three of themost frequently used in HTR task datasets, namely Ben-tham, IAM, and Saint Gall. In addition, the results on tworecently presented datasets, Peter the Greats manuscriptsand HKR Dataset, are provided.The paper describes the architecture of the neural net-work and two ways of increasing the volume of train-ing data: augmentation that simulates strikethrough text(HandWritten Blots) and a new text generation method(StackMix), which proved to be very effective in HTR tasks.StackMix can also be applied to the standalone task of gen-erating handwritten text based on printed text.

Results

Task	Dataset	Metric	Value	Model
Optical Character Recognition (OCR)	Saint Gall	CER	3.65	StackMix+Blots
Optical Character Recognition (OCR)	Bentham	CER	1.73	StackMix+Blots
Optical Character Recognition (OCR)	HKR	CER	3.49	StackMix+Blots
Optical Character Recognition (OCR)	Digital Peter	CER	2.5	StackMix+Blots
Optical Character Recognition (OCR)	IAM-D	CER	3.01	StackMix+Blots
Optical Character Recognition (OCR)	IAM-B	CER	3.77	StackMix+Blots
Handwritten Text Recognition	Saint Gall	CER	3.65	StackMix+Blots
Handwritten Text Recognition	Bentham	CER	1.73	StackMix+Blots
Handwritten Text Recognition	HKR	CER	3.49	StackMix+Blots
Handwritten Text Recognition	Digital Peter	CER	2.5	StackMix+Blots
Handwritten Text Recognition	IAM-D	CER	3.01	StackMix+Blots
Handwritten Text Recognition	IAM-B	CER	3.77	StackMix+Blots

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17 Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17 Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17 Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16 Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15