TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PGNet: Real-time Arbitrarily-Shaped Text Spotting with Poi...

PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, Guangming Shi

2021-04-12Scene Text DetectionText SpottingOptical Character Recognition (OCR)
PaperPDFCodeCode(official)

Abstract

The reading of arbitrarily-shaped text has received increasing research attention. However, existing text spotters are mostly built on two-stage frameworks or character-based methods, which suffer from either Non-Maximum Suppression (NMS), Region-of-Interest (RoI) operations, or character-level annotations. In this paper, to address the above problems, we propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time. The PGNet is a single-shot text spotter, where the pixel-level character classification map is learned with proposed PG-CTC loss avoiding the usage of character-level annotations. With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations involved, which guarantees high efficiency. Additionally, reasoning the relations between each character and its neighbors, a graph refinement module (GRM) is proposed to optimize the coarse recognition and improve the end-to-end performance. Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed. In particular, in Total-Text, it runs at 46.7 FPS, surpassing the previous spotters with a large margin.

Results

TaskDatasetMetricValueModel
Text SpottingICDAR 2015F-measure (%) - Generic Lexicon63.5PGNet
Text SpottingICDAR 2015F-measure (%) - Strong Lexicon83.3PGNet
Text SpottingICDAR 2015F-measure (%) - Weak Lexicon78.3PGNet
Scene Text DetectionICDAR 2015F-Measure53.6MCLAB_FCN
Scene Text DetectionICDAR 2015Precision70.8MCLAB_FCN
Scene Text DetectionICDAR 2015Recall43MCLAB_FCN
Scene Text DetectionICDAR 2015Accuracy62.3PGNet-A

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08PaddleOCR 3.0 Technical Report2025-07-08