TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text...

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang

2023-05-09Scene Text RecognitionOptical Character Recognition (OCR)
PaperPDFCode(official)

Abstract

Text irregularities pose significant challenges to scene text recognizers. Thin-Plate Spline (TPS)-based rectification is widely regarded as an effective means to deal with them. Currently, the calculation of TPS transformation parameters purely depends on the quality of regressed text borders. It ignores the text content and often leads to unsatisfactory rectified results for severely distorted text. In this work, we introduce TPS++, an attention-enhanced TPS transformation that incorporates the attention mechanism to text rectification for the first time. TPS++ formulates the parameter calculation as a joint process of foreground control point regression and content-based attention score estimation, which is computed by a dedicated designed gated-attention block. TPS++ builds a more flexible content-aware rectifier, generating a natural text correction that is easier to read by the subsequent recognizer. Moreover, TPS++ shares the feature backbone with the recognizer in part and implements the rectification at feature-level rather than image-level, incurring only a small overhead in terms of parameters and inference time. Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. Code is at https://github.com/simplify23/TPS_PP.

Results

TaskDatasetMetricValueModel
Scene ParsingSVTAccuracy94.6NRTR+TPS++
Scene ParsingIC13Accuracy97.8ABINet-LV+TPS++
Scene ParsingCUTE80Accuracy92.4NRTR+TPS++
Scene ParsingSVT-PAccuracy89.6ABINet-LV+TPS++
2D Semantic SegmentationSVTAccuracy94.6NRTR+TPS++
2D Semantic SegmentationIC13Accuracy97.8ABINet-LV+TPS++
2D Semantic SegmentationCUTE80Accuracy92.4NRTR+TPS++
2D Semantic SegmentationSVT-PAccuracy89.6ABINet-LV+TPS++
Scene Text RecognitionSVTAccuracy94.6NRTR+TPS++
Scene Text RecognitionIC13Accuracy97.8ABINet-LV+TPS++
Scene Text RecognitionCUTE80Accuracy92.4NRTR+TPS++
Scene Text RecognitionSVT-PAccuracy89.6ABINet-LV+TPS++

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment2025-07-17Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning2025-07-09TextPixs: Glyph-Conditioned Diffusion with Character-Aware Attention and OCR-Guided Supervision2025-07-08PaddleOCR 3.0 Technical Report2025-07-08