TrOCR

Computer VisionIntroduced 200011 papers

Description

TrOCR is an end-to-end Transformer-based OCR model for text recognition with pre-trained CV and NLP models. It leverages the Transformer architecture for both image understanding and wordpiece-level text generation. It first resizes the input text image into $384 × 384$ and then the image is split into a sequence of 16 patches which are used as the input to image Transformers. Standard Transformer architecture with the self-attention mechanism is leveraged on both encoder and decoder parts, where wordpiece units are generated as the recognized text from the input image.

Papers Using This Method

TRIDIS: A Comprehensive Medieval and Early Modern Corpus for HTR and NER2025-03-25 Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical records2025-01-20 Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway2025-01-13 Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions2024-12-24 Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation2024-07-09 OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst2024-06-14 Automatic Transcription of Handwritten Old Occitan Language2023-12-06 Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks2023-11-28 Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images2022-12-11 Transformer-based HTR for Historical Documents2022-03-21 TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models2021-09-21