TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Bao Hieu Tran, Thanh Le-Cong, Huu Manh Nguyen, Duc Anh Le, Thanh Hung Nguyen, Phi Le Nguyen

2022-01-01Scene Text RecognitionOptical Character Recognition (OCR)
PaperPDFCode(official)

Abstract

In the last decades, scene text recognition has gained worldwide attention from both the academic community and actual users due to its importance in a wide range of applications. Despite achievements in optical character recognition, scene text recognition remains challenging due to inherent problems such as distortions or irregular layout. Most of the existing approaches mainly leverage recurrence or convolution-based neural networks. However, while recurrent neural networks (RNNs) usually suffer from slow training speed due to sequential computation and encounter problems as vanishing gradient or bottleneck, CNN endures a trade-off between complexity and performance. In this paper, we introduce SAFL, a self-attention-based neural network model with the focal loss for scene text recognition, to overcome the limitation of the existing approaches. The use of focal loss instead of negative log-likelihood helps the model focus more on low-frequency samples training. Moreover, to deal with the distortions and irregular texts, we exploit Spatial TransformerNetwork (STN) to rectify text before passing to the recognition network. We perform experiments to compare the performance of the proposed model with seven benchmarks. The numerical results show that our model achieves the best performance.

Results

TaskDatasetMetricValueModel
Scene ParsingSVTAccuracy88.6SAFL
Scene ParsingICDAR2015Accuracy77.5SAFL
Scene ParsingICDAR 2003Accuracy95SAFL
Scene ParsingICDAR2013Accuracy92.8SAFL
2D Semantic SegmentationSVTAccuracy88.6SAFL
2D Semantic SegmentationICDAR2015Accuracy77.5SAFL
2D Semantic SegmentationICDAR 2003Accuracy95SAFL
2D Semantic SegmentationICDAR2013Accuracy92.8SAFL
Scene Text RecognitionSVTAccuracy88.6SAFL
Scene Text RecognitionICDAR2015Accuracy77.5SAFL
Scene Text RecognitionICDAR 2003Accuracy95SAFL
Scene Text RecognitionICDAR2013Accuracy92.8SAFL

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08PaddleOCR 3.0 Technical Report2025-07-08