SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Bao Hieu Tran, Thanh Le-Cong, Huu Manh Nguyen, Duc Anh Le, Thanh Hung Nguyen, Phi Le Nguyen

2022-01-01Scene Text Recognition Optical Character Recognition (OCR)

Abstract

In the last decades, scene text recognition has gained worldwide attention from both the academic community and actual users due to its importance in a wide range of applications. Despite achievements in optical character recognition, scene text recognition remains challenging due to inherent problems such as distortions or irregular layout. Most of the existing approaches mainly leverage recurrence or convolution-based neural networks. However, while recurrent neural networks (RNNs) usually suffer from slow training speed due to sequential computation and encounter problems as vanishing gradient or bottleneck, CNN endures a trade-off between complexity and performance. In this paper, we introduce SAFL, a self-attention-based neural network model with the focal loss for scene text recognition, to overcome the limitation of the existing approaches. The use of focal loss instead of negative log-likelihood helps the model focus more on low-frequency samples training. Moreover, to deal with the distortions and irregular texts, we exploit Spatial TransformerNetwork (STN) to rectify text before passing to the recognition network. We perform experiments to compare the performance of the proposed model with seven benchmarks. The numerical results show that our model achieves the best performance.

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	SVT	Accuracy	88.6	SAFL
Scene Parsing	ICDAR2015	Accuracy	77.5	SAFL
Scene Parsing	ICDAR 2003	Accuracy	95	SAFL
Scene Parsing	ICDAR2013	Accuracy	92.8	SAFL
2D Semantic Segmentation	SVT	Accuracy	88.6	SAFL
2D Semantic Segmentation	ICDAR2015	Accuracy	77.5	SAFL
2D Semantic Segmentation	ICDAR 2003	Accuracy	95	SAFL
2D Semantic Segmentation	ICDAR2013	Accuracy	92.8	SAFL
Scene Text Recognition	SVT	Accuracy	88.6	SAFL
Scene Text Recognition	ICDAR2015	Accuracy	77.5	SAFL
Scene Text Recognition	ICDAR 2003	Accuracy	95	SAFL
Scene Text Recognition	ICDAR2013	Accuracy	92.8	SAFL

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Abstract

Results

Related Papers

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Abstract

Results

Related Papers