TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mask TextSpotter v3: Segmentation Proposal Network for Rob...

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai

2020-07-18ECCV 2020 8Region ProposalText Spotting
PaperPDFCode(official)

Abstract

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress. However, most of the current arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals. RPN relies heavily on manually designed anchors and its proposals are represented with axis-aligned rectangles. The former presents difficulties in handling text instances of extreme aspect ratios or irregular shapes, and the latter often includes multiple neighboring instances into a single proposal, in cases of densely oriented text. To tackle these problems, we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. Our SPN is anchor-free and gives accurate representations of arbitrary-shape proposals. It is therefore superior to RPN in detecting text instances of extreme aspect ratios or irregular shapes. Furthermore, the accurate proposals produced by SPN allow masked RoI features to be used for decoupling neighboring text instances. As a result, our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise. Specifically, we outperform state-of-the-art methods by 21.9 percent on the Rotated ICDAR 2013 dataset (rotation robustness), 5.9 percent on the Total-Text dataset (shape robustness), and achieve state-of-the-art performance on the MSRA-TD500 dataset (aspect ratio robustness). Code is available at: https://github.com/MhLiao/MaskTextSpotterV3

Results

TaskDatasetMetricValueModel
Text SpottingTotal-TextF-measure (%) - Full Lexicon78.4MaskTextSpotter v3
Text SpottingTotal-TextF-measure (%) - No Lexicon71.2MaskTextSpotter v3
Text SpottingICDAR 2015F-measure (%) - Generic Lexicon74.2MaskTextSpotter v3
Text SpottingICDAR 2015F-measure (%) - Strong Lexicon83.3MaskTextSpotter v3
Text SpottingICDAR 2015F-measure (%) - Weak Lexicon78.1MaskTextSpotter v3

Related Papers

Text-Aware Image Restoration with Diffusion Models2025-06-11Bridging Annotation Gaps: Transferring Labels to Align Object Detection Datasets2025-06-05GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking2025-05-28SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting2025-04-14TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification2025-03-09OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models2025-02-22Generalization-Enhanced Few-Shot Object Detection in Remote Sensing2025-01-05OralXrays-9: Towards Hospital-Scale Panoramic X-ray Anomaly Detection via Personalized Multi-Object Query-Aware Mining2025-01-01