TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/UniHDSA: A Unified Relation Prediction Approach for Hierar...

UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis

Jiawei Wang, Kai Hu, Qiang Huo

2025-03-20Document Layout AnalysisTable DetectionDocument SummarizationPredictionInformation RetrievalRelation Prediction
PaperPDFCode(official)

Abstract

Document structure analysis, aka document layout analysis, is crucial for understanding both the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. Hierarchical Document Structure Analysis (HDSA) specifically aims to restore the hierarchical structure of documents created using authoring software with hierarchical schemas. Previous research has primarily followed two approaches: one focuses on tackling specific subtasks of HDSA in isolation, such as table detection or reading order prediction, while the other adopts a unified framework that uses multiple branches or modules, each designed to address a distinct task. In this work, we propose a unified relation prediction approach for HDSA, called UniHDSA, which treats various HDSA sub-tasks as relation prediction problems and consolidates relation prediction labels into a unified label space. This allows a single relation prediction module to handle multiple tasks simultaneously, whether at a page-level or document-level structure analysis. To validate the effectiveness of UniHDSA, we develop a multimodal end-to-end system based on Transformer architectures. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance on a hierarchical document structure analysis benchmark, Comp-HRDoc, and competitive results on a large-scale document layout analysis dataset, DocLayNet, effectively illustrating the superiority of our method across all sub-tasks. The Comp-HRDoc benchmark and UniHDSA's configurations are publicly available at https://github.com/microsoft/CompHRDoc.

Related Papers

Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Generative Click-through Rate Prediction with Applications to Search Advertising2025-07-15From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation2025-07-15Conformation-Aware Structure Prediction of Antigen-Recognizing Immune Proteins2025-07-11Foundation models for time series forecasting: Application in conformal prediction2025-07-09Temporal Information Retrieval via Time-Specifier Model Merging2025-07-09Predicting Graph Structure via Adapted Flux Balance Analysis2025-07-08