TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Show, Attend and Read: A Simple and Strong Baseline for Ir...

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang

2018-11-02Scene Text RecognitionIrregular Text RecognitionOptical Character Recognition (OCR)
PaperPDFCodeCodeCodeCodeCodeCodeCodeCode

Abstract

Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, to some extent, increase the difficulty in algorithm implementation and data collection. In this work, we propose an easy-to-implement strong baseline for irregular scene text recognition, using off-the-shelf neural network components and only word-level annotations. It is composed of a $31$-layer ResNet, an LSTM-based encoder-decoder framework and a 2-dimensional attention module. Despite its simplicity, the proposed method is robust and achieves state-of-the-art performance on both regular and irregular scene text recognition benchmarks. Code is available at: https://tinyurl.com/ShowAttendRead

Results

TaskDatasetMetricValueModel
Scene ParsingSVTAccuracy84.5SAR
Scene ParsingICDAR2015Accuracy69.2SAR
Scene ParsingICDAR2013Accuracy91SAR
2D Semantic SegmentationSVTAccuracy84.5SAR
2D Semantic SegmentationICDAR2015Accuracy69.2SAR
2D Semantic SegmentationICDAR2013Accuracy91SAR
Scene Text RecognitionSVTAccuracy84.5SAR
Scene Text RecognitionICDAR2015Accuracy69.2SAR
Scene Text RecognitionICDAR2013Accuracy91SAR

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08PaddleOCR 3.0 Technical Report2025-07-08