TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A3S: Adversarial learning of semantic representations for ...

A3S: Adversarial learning of semantic representations for Scene-Text Spotting

Masato Fujitake

2023-02-21Text Spotting
PaperPDF

Abstract

Scene-text spotting is a task that predicts a text area on natural scene images and recognizes its text characters simultaneously. It has attracted much attention in recent years due to its wide applications. Existing research has mainly focused on improving text region detection, not text recognition. Thus, while detection accuracy is improved, the end-to-end accuracy is insufficient. Texts in natural scene images tend to not be a random string of characters but a meaningful string of characters, a word. Therefore, we propose adversarial learning of semantic representations for scene text spotting (A3S) to improve end-to-end accuracy, including text recognition. A3S simultaneously predicts semantic features in the detected text area instead of only performing text recognition based on existing visual features. Experimental results on publicly available datasets show that the proposed method achieves better accuracy than other methods.

Results

TaskDatasetMetricValueModel
Text SpottingTotal-TextF-measure (%) - Full Lexicon85.1A3S
Text SpottingTotal-TextF-measure (%) - No Lexicon79.4A3S
Text SpottingSCUT-CTW1500F-Measure (%) - Full Lexicon82.3A3S
Text SpottingSCUT-CTW1500F-measure (%) - No Lexicon64.4A3S
Text SpottingICDAR 2015F-measure (%) - Generic Lexicon79.6A3S
Text SpottingICDAR 2015F-measure (%) - Strong Lexicon84.8A3S
Text SpottingICDAR 2015F-measure (%) - Weak Lexicon83.7A3S

Related Papers

Text-Aware Image Restoration with Diffusion Models2025-06-11GoMatching++: Parameter- and Data-Efficient Arbitrary-Shaped Video Text Spotting and Benchmarking2025-05-28SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting2025-04-14TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification2025-03-09OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models2025-02-22CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR2025-01-01Hear the Scene: Audio-Enhanced Text Spotting2024-12-27InstructOCR: Instruction Boosting Scene Text Spotting2024-12-20