Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang

2018-11-02Scene Text Recognition Irregular Text Recognition Optical Character Recognition (OCR)

Paper PDF Code Code Code Code Code Code Code Code

Abstract

Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, to some extent, increase the difficulty in algorithm implementation and data collection. In this work, we propose an easy-to-implement strong baseline for irregular scene text recognition, using off-the-shelf neural network components and only word-level annotations. It is composed of a $31$-layer ResNet, an LSTM-based encoder-decoder framework and a 2-dimensional attention module. Despite its simplicity, the proposed method is robust and achieves state-of-the-art performance on both regular and irregular scene text recognition benchmarks. Code is available at: https://tinyurl.com/ShowAttendRead

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	SVT	Accuracy	84.5	SAR
Scene Parsing	ICDAR2015	Accuracy	69.2	SAR
Scene Parsing	ICDAR2013	Accuracy	91	SAR
2D Semantic Segmentation	SVT	Accuracy	84.5	SAR
2D Semantic Segmentation	ICDAR2015	Accuracy	69.2	SAR
2D Semantic Segmentation	ICDAR2013	Accuracy	91	SAR
Scene Text Recognition	SVT	Accuracy	84.5	SAR
Scene Text Recognition	ICDAR2015	Accuracy	69.2	SAR
Scene Text Recognition	ICDAR2013	Accuracy	91	SAR

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17 Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15 A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14 Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09 Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09 TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08 PaddleOCR 3.0 Technical Report2025-07-08