TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/An Empirical Study of Scaling Law for OCR

An Empirical Study of Scaling Law for OCR

Miao Rang, Zhenni Bi, Chuanjian Liu, Yunhe Wang, Kai Han

2023-12-29Scene Text RecognitionOptical Character Recognition (OCR)
PaperPDFCode(official)

Abstract

The laws of model size, data volume, computation and model performance have been extensively studied in the field of Natural Language Processing (NLP). However, the scaling laws in Optical Character Recognition (OCR) have not yet been investigated. To address this, we conducted comprehensive studies that involved examining the correlation between performance and the scale of models, data volume and computation in the field of text recognition.Conclusively, the study demonstrates smooth power laws between performance and model size, as well as training data volume, when other influencing factors are held constant. Additionally, we have constructed a large-scale dataset called REBU-Syn, which comprises 6 million real samples and 18 million synthetic samples. Based on our scaling law and new dataset, we have successfully trained a scene text recognition model, achieving a new state-ofthe-art on 6 common test benchmarks with a top-1 average accuracy of 97.42%. The models and dataset are publicly available at https://github.com/large-ocr-model/large-ocr-model.github.io.

Results

TaskDatasetMetricValueModel
Scene ParsingSVTAccuracy98.76CLIP4STR-B*
Scene ParsingSVTPAccuracy98.13CLIP4STR-L*
Scene ParsingCUTE80Accuracy99.65CLIP4STR-B*
Scene ParsingICDAR2015Accuracy92.6CLIP4STR-L*
Scene ParsingICDAR2013Accuracy99.42CLIP4STR-L*
2D Semantic SegmentationSVTAccuracy98.76CLIP4STR-B*
2D Semantic SegmentationSVTPAccuracy98.13CLIP4STR-L*
2D Semantic SegmentationCUTE80Accuracy99.65CLIP4STR-B*
2D Semantic SegmentationICDAR2015Accuracy92.6CLIP4STR-L*
2D Semantic SegmentationICDAR2013Accuracy99.42CLIP4STR-L*
Scene Text RecognitionSVTAccuracy98.76CLIP4STR-B*
Scene Text RecognitionSVTPAccuracy98.13CLIP4STR-L*
Scene Text RecognitionCUTE80Accuracy99.65CLIP4STR-B*
Scene Text RecognitionICDAR2015Accuracy92.6CLIP4STR-L*
Scene Text RecognitionICDAR2013Accuracy99.42CLIP4STR-L*

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08PaddleOCR 3.0 Technical Report2025-07-08