Dongwoo Park, Suk Pil Ko
Scene text image super-resolution (STISR) enhances the resolution and quality of low-resolution images. Unlike previous studies that treated scene text images as natural images, recent methods using a text prior (TP), extracted from a pre-trained text recognizer, have shown strong performance. However, two major issues emerge: (1) Explicit categorical priors, like TP, can negatively impact STISR if incorrect. We reveal that these explicit priors are unstable and propose replacing them with Non-CAtegorical Prior (NCAP) using penultimate layer representations. (2) Pre-trained recognizers used to generate TP struggle with low-resolution images. To address this, most studies jointly train the recognizer with the STISR network to bridge the domain gap between low- and high-resolution images, but this can cause an overconfidence phenomenon in the prior modality. We highlight this issue and propose a method to mitigate it by mixing hard and soft labels. Experiments on the TextZoom dataset demonstrate an improvement by 3.5%, while our method significantly enhances generalization performance by 14.8\% across four text recognition datasets. Our method generalizes to all TP-guided STISR networks.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Super-Resolution | TextZoom | ASTER Overall Accuracy | 68.1 | NCAP |
| Super-Resolution | TextZoom | Average Accuracy | 63.7 | NCAP |
| Super-Resolution | TextZoom | CRNN Overall Accuracy | 58.3 | NCAP |
| Super-Resolution | TextZoom | MORAN Overall Accuracy | 64.6 | NCAP |
| Image Super-Resolution | TextZoom | ASTER Overall Accuracy | 68.1 | NCAP |
| Image Super-Resolution | TextZoom | Average Accuracy | 63.7 | NCAP |
| Image Super-Resolution | TextZoom | CRNN Overall Accuracy | 58.3 | NCAP |
| Image Super-Resolution | TextZoom | MORAN Overall Accuracy | 64.6 | NCAP |
| 3D Object Super-Resolution | TextZoom | ASTER Overall Accuracy | 68.1 | NCAP |
| 3D Object Super-Resolution | TextZoom | Average Accuracy | 63.7 | NCAP |
| 3D Object Super-Resolution | TextZoom | CRNN Overall Accuracy | 58.3 | NCAP |
| 3D Object Super-Resolution | TextZoom | MORAN Overall Accuracy | 64.6 | NCAP |
| 16k | TextZoom | ASTER Overall Accuracy | 68.1 | NCAP |
| 16k | TextZoom | Average Accuracy | 63.7 | NCAP |
| 16k | TextZoom | CRNN Overall Accuracy | 58.3 | NCAP |
| 16k | TextZoom | MORAN Overall Accuracy | 64.6 | NCAP |