TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unified Pretraining Framework for Document Understanding

Unified Pretraining Framework for Document Understanding

Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Nikolaos Barmpalios, Rajiv Jain, Ani Nenkova, Tong Sun

2022-04-22Document Layout Analysisdocument understandingSelf-Supervised Learning
PaperPDF

Abstract

Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards reducing annotation efforts by training models with self-supervised objectives. However, most of the existing document pretraining methods are still language-dominated. We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. Each input element is composed of words and visual features from a semantic region of the input document image. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses, encouraging the representation to model sentences, learn similarities, and align modalities. Extensive empirical analysis demonstrates that the pretraining procedure learns better joint representations and leads to improvements in downstream tasks.

Results

TaskDatasetMetricValueModel
Document Layout AnalysisPubLayNet valFigure0.964UDoc
Document Layout AnalysisPubLayNet valList0.937UDoc
Document Layout AnalysisPubLayNet valOverall0.939UDoc
Document Layout AnalysisPubLayNet valTable0.973UDoc
Document Layout AnalysisPubLayNet valText0.939UDoc
Document Layout AnalysisPubLayNet valTitle0.885UDoc

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17A Survey on MLLM-based Visually Rich Document Understanding: Methods, Challenges, and Emerging Trends2025-07-14Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14PaddleOCR 3.0 Technical Report2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning2025-07-01World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model2025-07-01ShapeEmbed: a self-supervised learning framework for 2D contour quantification2025-07-01