TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Reading Order Matters: Information Extraction from Visuall...

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Chong Zhang, Ya Guo, Yi Tu, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang, Tao Gui

2023-10-17Relation ExtractionToken ClassificationSemantic entity labelingnamed-entity-recognitionEntity LinkingNamed Entity RecognitionSentence OrderingNERReading Order DetectionNamed Entity Recognition (NER)Key Information ExtractionKey-value Pair Extraction
PaperPDFCodeCode(official)

Abstract

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.

Results

TaskDatasetMetricValueModel
Relation ExtractionFUNSDF179.2TPP (LayoutMask)
Entity LinkingFUNSDF179.2TPP (LayoutMask)
Named Entity Recognition (NER)FUNSD-rF180.4TPP (LayoutLMv3)
Named Entity Recognition (NER)FUNSD-rF178.19TPP (LayoutMask)
Named Entity Recognition (NER)CORD-rF191.85TPP (LayoutLMv3)
Named Entity Recognition (NER)CORD-rF189.34TPP (LayoutMask)
Semantic entity labelingFUNSDF185.16TPP (LayoutMask)
Key Information ExtractionCORDF196.92TPP (LayoutMask)
Key Information ExtractionRFUND-ENkey-value pair F150.27TPP (LayoutLMv3_base)
Reading Order DetectionROORSegment-level F142.96TPP (LayoutLMv3-base)
Reading Order DetectionReadingBankAverage Page-level BLEU98.16TPP (LayoutMask)
Reading Order DetectionReadingBankAverage Relative Distance (ARD)0.37TPP (LayoutMask)

Related Papers

DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations2025-07-08Flippi: End To End GenAI Assistant for E-Commerce2025-07-08PaddleOCR 3.0 Technical Report2025-07-08Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models2025-06-28Class-Agnostic Region-of-Interest Matching in Document Images2025-06-26Multiple Streams of Relation Extraction: Enriching and Recalling in Transformers2025-06-25Chaining Event Spans for Temporal Relation Grounding2025-06-17Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?2025-06-12