TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/On Recognizing Texts of Arbitrary Shapes with 2D Self-Atte...

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

Junyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee

2019-10-10Scene Text Recognition
PaperPDFCodeCode

Abstract

Scene text recognition (STR) is the task of recognizing character sequences in natural scenes. While there have been great advances in STR methods, current methods still fail to recognize texts in arbitrary shapes, such as heavily curved or rotated texts, which are abundant in daily life (e.g. restaurant signs, product labels, company logos, etc). This paper introduces a novel architecture to recognizing texts of arbitrary shapes, named Self-Attention Text Recognition Network (SATRN), which is inspired by the Transformer. SATRN utilizes the self-attention mechanism to describe two-dimensional (2D) spatial dependencies of characters in a scene text image. Exploiting the full-graph propagation of self-attention, SATRN can recognize texts with arbitrary arrangements and large inter-character spacing. As a result, SATRN outperforms existing STR models by a large margin of 5.7 pp on average in "irregular text" benchmarks. We provide empirical analyses that illustrate the inner mechanisms and the extent to which the model is applicable (e.g. rotated and multi-line text). We will open-source the code.

Results

TaskDatasetMetricValueModel
Scene ParsingSVTAccuracy91.3SATRN
Scene ParsingICDAR2015Accuracy79SATRN
Scene ParsingICDAR 2003Accuracy96.7SATRN
Scene ParsingICDAR2013Accuracy94.1SATRN
2D Semantic SegmentationSVTAccuracy91.3SATRN
2D Semantic SegmentationICDAR2015Accuracy79SATRN
2D Semantic SegmentationICDAR 2003Accuracy96.7SATRN
2D Semantic SegmentationICDAR2013Accuracy94.1SATRN
Scene Text RecognitionSVTAccuracy91.3SATRN
Scene Text RecognitionICDAR2015Accuracy79SATRN
Scene Text RecognitionICDAR 2003Accuracy96.7SATRN
Scene Text RecognitionICDAR2013Accuracy94.1SATRN

Related Papers

Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition2025-03-24Efficient and Accurate Scene Text Recognition with Cascaded-Transformers2025-03-24Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation2025-03-20A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition2025-03-19EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition2025-02-13Billet Number Recognition Based on Test-Time Adaptation2025-02-13Ocean-OCR: Towards General OCR Application via a Vision-Language Model2025-01-26Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance2024-12-13