AON: Towards Arbitrarily-Oriented Text Recognition

Zhanzhan Cheng, Yangliu Xu, Fan Bai, Yi Niu, ShiLiang Pu, Shuigeng Zhou

2017-11-12CVPR 2018 6Scene Text Recognition Optical Character Recognition (OCR)

Abstract

Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.

Results

Task	Dataset	Metric	Value	Model
Scene Parsing	ICDAR2015	Accuracy	73	AON
Scene Parsing	ICDAR 2003	Accuracy	91.5	AON
2D Semantic Segmentation	ICDAR2015	Accuracy	73	AON
2D Semantic Segmentation	ICDAR 2003	Accuracy	91.5	AON
Scene Text Recognition	ICDAR2015	Accuracy	73	AON
Scene Text Recognition	ICDAR 2003	Accuracy	91.5	AON

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17 Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15 A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14 Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09 Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09 TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08 PaddleOCR 3.0 Technical Report2025-07-08