DTrOCR: Decoder-only Transformer for Optical Character Recognition

Masato Fujitake

2023-08-30Handwritten Text Recognition Scene Text Recognition Language Modelling Optical Character Recognition (OCR)Task 2

Paper PDF Code

Abstract

Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these features. In this study, we propose a simpler and more effective method for text recognition, known as the Decoder-only Transformer for Optical Character Recognition (DTrOCR). This method uses a decoder-only Transformer to take advantage of a generative language model that is pre-trained on a large corpus. We examined whether a generative language model that has been successful in natural language processing can also be effective for text recognition in computer vision. Our experiments demonstrated that DTrOCR outperforms current state-of-the-art methods by a large margin in the recognition of printed, handwritten, and scene text in both English and Chinese.

Results

Task	Dataset	Metric	Value	Model
Optical Character Recognition (OCR)	Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	Accuracy (%)	89.6	DTrOCR
Optical Character Recognition (OCR)	Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	Accuracy (%)	89.6	DTrOCR 105M
Optical Character Recognition (OCR)	IAM	CER	2.38	DTrOCR 105M
Scene Parsing	SVT	Accuracy	98.9	DTrOCR 105M
Scene Parsing	SVTP	Accuracy	98.6	DTrOCR 105M
Scene Parsing	CUTE80	Accuracy	99.1	DTrOCR 105M
Scene Parsing	ICDAR2015	Accuracy	93.5	DTrOCR 105M
Scene Parsing	IIIT5k	Accuracy	99.6	DTrOCR 105M
Scene Parsing	ICDAR2013	Accuracy	99.4	DTrOCR 105M
2D Semantic Segmentation	SVT	Accuracy	98.9	DTrOCR 105M
2D Semantic Segmentation	SVTP	Accuracy	98.6	DTrOCR 105M
2D Semantic Segmentation	CUTE80	Accuracy	99.1	DTrOCR 105M
2D Semantic Segmentation	ICDAR2015	Accuracy	93.5	DTrOCR 105M
2D Semantic Segmentation	IIIT5k	Accuracy	99.6	DTrOCR 105M
2D Semantic Segmentation	ICDAR2013	Accuracy	99.4	DTrOCR 105M
Handwritten Text Recognition	IAM	CER	2.38	DTrOCR 105M
Scene Text Recognition	SVT	Accuracy	98.9	DTrOCR 105M
Scene Text Recognition	SVTP	Accuracy	98.6	DTrOCR 105M
Scene Text Recognition	CUTE80	Accuracy	99.1	DTrOCR 105M
Scene Text Recognition	ICDAR2015	Accuracy	93.5	DTrOCR 105M
Scene Text Recognition	IIIT5k	Accuracy	99.6	DTrOCR 105M
Scene Text Recognition	ICDAR2013	Accuracy	99.4	DTrOCR 105M

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Abstract

Results

Related Papers

DTrOCR: Decoder-only Transformer for Optical Character Recognition

Abstract

Results

Related Papers