TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, ShiLiang Pu, Fei Wu

2020-12-08Text Spotting
PaperPDFCode(official)

Abstract

Recently end-to-end scene text spotting has become a popular research topic due to its advantages of global optimization and high maintainability in real applications. Most methods attempt to develop various region of interest (RoI) operations to concatenate the detection part and the sequence recognition part into a two-stage text spotting framework. However, in such framework, the recognition part is highly sensitive to the detected results (e.g.), the compactness of text contours). To address this problem, in this paper, we propose a novel Mask AttentioN Guided One-stage text spotting framework named MANGO, in which character sequences can be directly recognized without RoI operation. Concretely, a position-aware mask attention module is developed to generate attention weights on each text instance and its characters. It allows different text instances in an image to be allocated on different feature map channels which are further grouped as a batch of instance features. Finally, a lightweight sequence decoder is applied to generate the character sequences. It is worth noting that MANGO inherently adapts to arbitrary-shaped text spotting and can be trained end-to-end with only coarse position information (e.g.), rectangular bounding box) and text annotations. Experimental results show that the proposed method achieves competitive and even new state-of-the-art performance on both regular and irregular text spotting benchmarks, i.e., ICDAR 2013, ICDAR 2015, Total-Text, and SCUT-CTW1500.

Results

TaskDatasetMetricValueModel
Text SpottingTotal-TextF-measure (%) - Full Lexicon83.6MANGO
Text SpottingTotal-TextF-measure (%) - No Lexicon72.9MANGO
Text SpottingSCUT-CTW1500F-Measure (%) - Full Lexicon78.7MANGO
Text SpottingSCUT-CTW1500F-measure (%) - No Lexicon58.9MANGO
Text SpottingICDAR 2015F-measure (%) - Generic Lexicon67.3MANGO
Text SpottingICDAR 2015F-measure (%) - Strong Lexicon81.8MANGO
Text SpottingICDAR 2015F-measure (%) - Weak Lexicon78.9MANGO

Related Papers

Text-Aware Image Restoration with Diffusion Models2025-06-11GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking2025-05-28SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting2025-04-14TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification2025-03-09OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models2025-02-22CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR2025-01-01Hear the Scene: Audio-Enhanced Text Spotting2024-12-27InstructOCR: Instruction Boosting Scene Text Spotting2024-12-20