NCAP: Scene Text Image Super-Resolution with Non-CAtegorical Prior

Dongwoo Park, Suk Pil Ko

2025-04-01Super-Resolution Image Super-Resolution

Abstract

Scene text image super-resolution (STISR) enhances the resolution and quality of low-resolution images. Unlike previous studies that treated scene text images as natural images, recent methods using a text prior (TP), extracted from a pre-trained text recognizer, have shown strong performance. However, two major issues emerge: (1) Explicit categorical priors, like TP, can negatively impact STISR if incorrect. We reveal that these explicit priors are unstable and propose replacing them with Non-CAtegorical Prior (NCAP) using penultimate layer representations. (2) Pre-trained recognizers used to generate TP struggle with low-resolution images. To address this, most studies jointly train the recognizer with the STISR network to bridge the domain gap between low- and high-resolution images, but this can cause an overconfidence phenomenon in the prior modality. We highlight this issue and propose a method to mitigate it by mixing hard and soft labels. Experiments on the TextZoom dataset demonstrate an improvement by 3.5%, while our method significantly enhances generalization performance by 14.8\% across four text recognition datasets. Our method generalizes to all TP-guided STISR networks.

Results

Task	Dataset	Metric	Value	Model
Super-Resolution	TextZoom	ASTER Overall Accuracy	68.1	NCAP
Super-Resolution	TextZoom	Average Accuracy	63.7	NCAP
Super-Resolution	TextZoom	CRNN Overall Accuracy	58.3	NCAP
Super-Resolution	TextZoom	MORAN Overall Accuracy	64.6	NCAP
Image Super-Resolution	TextZoom	ASTER Overall Accuracy	68.1	NCAP
Image Super-Resolution	TextZoom	Average Accuracy	63.7	NCAP
Image Super-Resolution	TextZoom	CRNN Overall Accuracy	58.3	NCAP
Image Super-Resolution	TextZoom	MORAN Overall Accuracy	64.6	NCAP
3D Object Super-Resolution	TextZoom	ASTER Overall Accuracy	68.1	NCAP
3D Object Super-Resolution	TextZoom	Average Accuracy	63.7	NCAP
3D Object Super-Resolution	TextZoom	CRNN Overall Accuracy	58.3	NCAP
3D Object Super-Resolution	TextZoom	MORAN Overall Accuracy	64.6	NCAP
16k	TextZoom	ASTER Overall Accuracy	68.1	NCAP
16k	TextZoom	Average Accuracy	63.7	NCAP
16k	TextZoom	CRNN Overall Accuracy	58.3	NCAP
16k	TextZoom	MORAN Overall Accuracy	64.6	NCAP

NCAP: Scene Text Image Super-Resolution with Non-CAtegorical Prior

Abstract

Results

Related Papers

NCAP: Scene Text Image Super-Resolution with Non-CAtegorical Prior

Abstract

Results

Related Papers