TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ABCNet v2: Adaptive Bezier-Curve Network for Real-time End...

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

2021-05-08Text Spotting
PaperPDFCode

Abstract

End-to-end text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity of the two complimentary tasks. It remains an open problem especially when processing arbitrarily-shaped text instances. Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2). Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance of arbitrary shapes, significantly improving the precision of recognition over previous methods. 3) Different from previous methods, which often suffer from complex post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a simple pipeline with the only post-processing non-maximum suppression (NMS). 4) As the performance of text recognition closely depends on feature alignment, ABCNet v2 further adopts a simple yet effective coordinate convolution to encode the position of the convolutional filters, which leads to a considerable improvement with negligible computation overhead. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the-art performance while maintaining very high efficiency.

Results

TaskDatasetMetricValueModel
Text SpottingTotal-TextF-measure (%) - Full Lexicon78.1ABCNet v2
Text SpottingTotal-TextF-measure (%) - No Lexicon70.4ABCNet v2
Text SpottingInverse-TextF-measure (%) - Full Lexicon47.4ABCNet v2
Text SpottingInverse-TextF-measure (%) - No Lexicon34.5ABCNet v2
Text SpottingSCUT-CTW1500F-Measure (%) - Full Lexicon77.2ABCNet v2
Text SpottingSCUT-CTW1500F-measure (%) - No Lexicon57.5ABCNet v2
Text SpottingICDAR 2015F-measure (%) - Generic Lexicon73ABCNet v2
Text SpottingICDAR 2015F-measure (%) - Strong Lexicon82.7ABCNet v2
Text SpottingICDAR 2015F-measure (%) - Weak Lexicon78.5ABCNet v2

Related Papers

Text-Aware Image Restoration with Diffusion Models2025-06-11GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking2025-05-28SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting2025-04-14TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification2025-03-09OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models2025-02-22CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR2025-01-01Hear the Scene: Audio-Enhanced Text Spotting2024-12-27InstructOCR: Instruction Boosting Scene Text Spotting2024-12-20