TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Self-supervised Implicit Glyph Attention for Text Recognit...

Self-supervised Implicit Glyph Attention for Text Recognition

Tongkun Guan, Chaochen Gu, Jingzheng Tu, Xue Yang, Qi Feng, Yudi Zhao, Xiaokang Yang, Wei Shen

2022-03-07CVPR 2023 1Scene Text RecognitionText Segmentation
PaperPDFCode(official)

Abstract

The attention mechanism has become the \emph{de facto} module in scene text recognition (STR) methods, due to its capability of extracting character-level representations. These methods can be summarized into implicit attention based and supervised attention based, depended on how the attention is computed, i.e., implicit attention and supervised attention are learned from sequence-level text annotations and or character-level bounding box annotations, respectively. Implicit attention, as it may extract coarse or even incorrect spatial regions as character attention, is prone to suffering from an alignment-drifted issue. Supervised attention can alleviate the above issue, but it is character category-specific, which requires extra laborious character-level bounding box annotations and would be memory-intensive when handling languages with larger character categories. To address the aforementioned issues, we propose a novel attention mechanism for STR, self-supervised implicit glyph attention (SIGA). SIGA delineates the glyph structures of text images by jointly self-supervised text segmentation and implicit attention alignment, which serve as the supervision to improve attention correctness without extra character-level annotations. Experimental results demonstrate that SIGA performs consistently and significantly better than previous attention-based STR methods, in terms of both attention correctness and final recognition performance on publicly available context benchmarks and our contributed contextless benchmarks.

Results

TaskDatasetMetricValueModel
Scene ParsingSVTAccuracy95.1SIGA_T
Scene ParsingSVTPAccuracy90.5SIGA_T
Scene ParsingCUTE80Accuracy93.1SIGA_T
Scene ParsingICDAR2015Accuracy87.6SIGA_S
Scene ParsingICDAR 2003Accuracy97SIGA_T
Scene ParsingIIIT5kAccuracy96.9SIGA_S
Scene ParsingICDAR2013Accuracy97.8SIGA_T
2D Semantic SegmentationSVTAccuracy95.1SIGA_T
2D Semantic SegmentationSVTPAccuracy90.5SIGA_T
2D Semantic SegmentationCUTE80Accuracy93.1SIGA_T
2D Semantic SegmentationICDAR2015Accuracy87.6SIGA_S
2D Semantic SegmentationICDAR 2003Accuracy97SIGA_T
2D Semantic SegmentationIIIT5kAccuracy96.9SIGA_S
2D Semantic SegmentationICDAR2013Accuracy97.8SIGA_T
Scene Text RecognitionSVTAccuracy95.1SIGA_T
Scene Text RecognitionSVTPAccuracy90.5SIGA_T
Scene Text RecognitionCUTE80Accuracy93.1SIGA_T
Scene Text RecognitionICDAR2015Accuracy87.6SIGA_S
Scene Text RecognitionICDAR 2003Accuracy97SIGA_T
Scene Text RecognitionIIIT5kAccuracy96.9SIGA_S
Scene Text RecognitionICDAR2013Accuracy97.8SIGA_T

Related Papers

The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation2025-06-10BP-Seg: A graphical model approach to unsupervised and non-contiguous text segmentation using belief propagation2025-05-22BR-TaxQA-R: A Dataset for Question Answering with References for Brazilian Personal Income Tax Law, including case law2025-05-21TSAL: Few-shot Text Segmentation Based on Attribute Learning2025-04-15Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition2025-03-24Efficient and Accurate Scene Text Recognition with Cascaded-Transformers2025-03-24Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation2025-03-20A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition2025-03-19