TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GeoLayoutLM: Geometric Pre-training for Visual Information...

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

Chuwei Luo, Changxu Cheng, Qi Zheng, Cong Yao

2023-04-21CVPR 2023 1Relation ExtractionSemantic entity labelingEntity LinkingDocument AIKey Information ExtractionKey-value Pair Extraction
PaperPDFCode(official)

Abstract

Visual information extraction (VIE) plays an important role in Document Intelligence. Generally, it is divided into two tasks: semantic entity recognition (SER) and relation extraction (RE). Recently, pre-trained models for documents have achieved substantial progress in VIE, particularly in SER. However, most of the existing models learn the geometric representation in an implicit way, which has been found insufficient for the RE task since geometric information is especially crucial for RE. Moreover, we reveal another factor that limits the performance of RE lies in the objective gap between the pre-training phase and the fine-tuning phase for RE. To tackle these issues, we propose in this paper a multi-modal framework, named GeoLayoutLM, for VIE. GeoLayoutLM explicitly models the geometric relations in pre-training, which we call geometric pre-training. Geometric pre-training is achieved by three specially designed geometry-related pre-training tasks. Additionally, novel relation heads, which are pre-trained by the geometric pre-training tasks and fine-tuned for RE, are elaborately designed to enrich and enhance the feature representation. According to extensive experiments on standard VIE benchmarks, GeoLayoutLM achieves highly competitive scores in the SER task and significantly outperforms the previous state-of-the-arts for RE (\eg, the F1 score of RE on FUNSD is boosted from 80.35\% to 89.45\%). The code and models are publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/GeoLayoutLM

Results

TaskDatasetMetricValueModel
Relation ExtractionFUNSDF189.45GeoLayoutLM
Relation ExtractionFUNSDF180.35LayoutLMv3 large
Relation ExtractionFUNSDF180.35LayoutLMv3 large
Entity LinkingFUNSDF189.45GeoLayoutLM
Semantic entity labelingFUNSDF192.86GeoLayoutLM
Key Information ExtractionCORDF197.97GeoLayoutLM
Key Information ExtractionRFUND-ENkey-value pair F169.03GeoLayoutLM

Related Papers

DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations2025-07-08PaddleOCR 3.0 Technical Report2025-07-08Class-Agnostic Region-of-Interest Matching in Document Images2025-06-26Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers2025-06-25Chaining Event Spans for Temporal Relation Grounding2025-06-17Summarization for Generative Relation Extraction in the Microbiome Domain2025-06-10Conservative Bias in Large Language Models: Measuring Relation Predictions2025-06-09Comparative Analysis of AI Agent Architectures for Entity Relationship Classification2025-06-03