TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TrOCR: Transformer-based Optical Character Recognition wit...

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei

2021-09-21Text GenerationHandwritten Text RecognitionScene Text RecognitionLanguage ModellingOptical Character Recognition (OCR)
PaperPDFCodeCodeCodeCodeCode(official)CodeCodeCode

Abstract

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at \url{https://aka.ms/trocr}.

Results

TaskDatasetMetricValueModel
Optical Character Recognition (OCR)IAM(line-level)Test CER3.4TrOCR
Optical Character Recognition (OCR)IAMCER2.89TrOCR-large 558M
Optical Character Recognition (OCR)IAMCER3.42TrOCR-base 334M
Optical Character Recognition (OCR)IAMCER4.22TrOCR-small 62M
Optical Character Recognition (OCR)LAM(line-level)Test CER3.6TrOCR
Optical Character Recognition (OCR)LAM(line-level)Test WER11.6TrOCR
Handwritten Text RecognitionIAM(line-level)Test CER3.4TrOCR
Handwritten Text RecognitionIAMCER2.89TrOCR-large 558M
Handwritten Text RecognitionIAMCER3.42TrOCR-base 334M
Handwritten Text RecognitionIAMCER4.22TrOCR-small 62M
Handwritten Text RecognitionLAM(line-level)Test CER3.6TrOCR
Handwritten Text RecognitionLAM(line-level)Test WER11.6TrOCR

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16Assay2Mol: large language model-based drug design using BioAssay context2025-07-16