Description
TrOCR is an end-to-end Transformer-based OCR model for text recognition with pre-trained CV and NLP models. It leverages the Transformer architecture for both image understanding and wordpiece-level text generation. It first resizes the input text image into and then the image is split into a sequence of 16 patches which are used as the input to image Transformers. Standard Transformer architecture with the self-attention mechanism is leveraged on both encoder and decoder parts, where wordpiece units are generated as the recognized text from the input image.
Papers Using This Method
TRIDIS: A Comprehensive Medieval and Early Modern Corpus for HTR and NER2025-03-25Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical records2025-01-20Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway2025-01-13Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions2024-12-24Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation2024-07-09OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst2024-06-14Automatic Transcription of Handwritten Old Occitan Language2023-12-06Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks2023-11-28Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images2022-12-11Transformer-based HTR for Historical Documents2022-03-21TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models2021-09-21