TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VSR: A Unified Framework for Document Layout Analysis comb...

VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations

Peng Zhang, Can Li, Liang Qiao, Zhanzhan Cheng, ShiLiang Pu, Yi Niu, Fei Wu

2021-05-13Document Layout Analysis
PaperPDFCode(official)

Abstract

Document layout analysis is crucial for understanding document structures. On this task, vision and semantics of documents, and relations between layout components contribute to the understanding process. Though many works have been proposed to exploit the above information, they show unsatisfactory results. NLP-based methods model layout analysis as a sequence labeling task and show insufficient capabilities in layout modeling. CV-based methods model layout analysis as a detection or segmentation task, but bear limitations of inefficient modality fusion and lack of relation modeling between layout components. To address the above limitations, we propose a unified framework VSR for document layout analysis, combining vision, semantics and relations. VSR supports both NLP-based and CV-based methods. Specifically, we first introduce vision through document image and semantics through text embedding maps. Then, modality-specific visual and semantic features are extracted using a two-stream network, which are adaptively fused to make full use of complementary information. Finally, given component candidates, a relation module based on graph neural network is incorported to model relations between components and output final results. On three popular benchmarks, VSR outperforms previous models by large margins. Code will be released soon.

Results

TaskDatasetMetricValueModel
Document Layout AnalysisPubLayNet valFigure0.964VSR
Document Layout AnalysisPubLayNet valList0.947VSR
Document Layout AnalysisPubLayNet valOverall0.957VSR
Document Layout AnalysisPubLayNet valTable0.974VSR
Document Layout AnalysisPubLayNet valText0.967VSR
Document Layout AnalysisPubLayNet valTitle0.931VSR

Related Papers

Class-Agnostic Region-of-Interest Matching in Document Images2025-06-26From Codicology to Code: A Comparative Study of Transformer and YOLO-based Detectors for Layout Analysis in Historical Documents2025-06-25SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation2025-05-20A document processing pipeline for the construction of a dataset for topic modeling based on the judgments of the Italian Supreme Court2025-05-13Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs2025-05-12AnnoPage Dataset: Dataset of Non-Textual Elements in Documents with Fine-Grained Categorization2025-03-28SFDLA: Source-Free Document Layout Analysis2025-03-24PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction2025-03-21