TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PubLayNet: largest dataset ever for document layout analysis

PubLayNet: largest dataset ever for document layout analysis

Xu Zhong, Jianbin Tang, Antonio Jimeno Yepes

2019-08-16Document Layout AnalysisTransfer Learning
PaperPDFCode(official)CodeCodeCodeCodeCode

Abstract

Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.

Results

TaskDatasetMetricValueModel
Document Layout AnalysisPubLayNet valFigure0.949Mask RCNN
Document Layout AnalysisPubLayNet valList0.886Mask RCNN
Document Layout AnalysisPubLayNet valOverall0.91Mask RCNN
Document Layout AnalysisPubLayNet valTable0.96Mask RCNN
Document Layout AnalysisPubLayNet valText0.916Mask RCNN
Document Layout AnalysisPubLayNet valTitle0.84Mask RCNN
Document Layout AnalysisPubLayNet valFigure0.937Faster RCNN
Document Layout AnalysisPubLayNet valList0.883Faster RCNN
Document Layout AnalysisPubLayNet valOverall0.902Faster RCNN
Document Layout AnalysisPubLayNet valTable0.954Faster RCNN
Document Layout AnalysisPubLayNet valText0.91Faster RCNN
Document Layout AnalysisPubLayNet valTitle0.826Faster RCNN

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16Robust-Multi-Task Gradient Boosting2025-07-15Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift2025-07-12The Bayesian Approach to Continual Learning: An Overview2025-07-11Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol2025-07-08Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving2025-07-08