SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, Weiping Wang

2020-05-22CVPR 2020 6Scene Text Recognition Optical Character Recognition (OCR)

Abstract

Scene text recognition is a hot research topic in computer vision. Recently, many recognition methods based on the encoder-decoder framework have been proposed, and they can handle scene texts of perspective distortion and curve shape. Nevertheless, they still face lots of challenges like image blur, uneven illumination, and incomplete characters. We argue that most encoder-decoder methods are based on local visual features without explicit global semantic information. In this work, we propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts. The semantic information is used both in the encoder module for supervision and in the decoder module for initializing. In particular, the state-of-the art ASTER method is integrated into the proposed framework as an exemplar. Extensive experiments demonstrate that the proposed framework is more robust for low-quality text images, and achieves state-of-the-art results on several benchmark datasets.

Results

Task	Dataset	Metric	Value	Model
Optical Character Recognition (OCR)	Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	Accuracy (%)	61.2	SEED
Scene Parsing	SVT	Accuracy	89.6	SEED
Scene Parsing	ICDAR2015	Accuracy	80	SEED
Scene Parsing	ICDAR2013	Accuracy	92.8	SEED
2D Semantic Segmentation	SVT	Accuracy	89.6	SEED
2D Semantic Segmentation	ICDAR2015	Accuracy	80	SEED
2D Semantic Segmentation	ICDAR2013	Accuracy	92.8	SEED
Scene Text Recognition	SVT	Accuracy	89.6	SEED
Scene Text Recognition	ICDAR2015	Accuracy	80	SEED
Scene Text Recognition	ICDAR2013	Accuracy	92.8	SEED

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17 Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15 A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14 Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09 Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09 TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08 PaddleOCR 3.0 Technical Report2025-07-08