Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Chong Zhang, Ya Guo, Yi Tu, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang, Tao Gui

2023-10-17Relation Extraction Token Classification Semantic entity labeling named-entity-recognition Entity Linking Named Entity Recognition Sentence Ordering NER Reading Order Detection Named Entity Recognition (NER)Key Information Extraction Key-value Pair Extraction

Paper PDF Code Code(official)

Abstract

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.

Results

Task	Dataset	Metric	Value	Model
Relation Extraction	FUNSD	F1	79.2	TPP (LayoutMask)
Entity Linking	FUNSD	F1	79.2	TPP (LayoutMask)
Named Entity Recognition (NER)	FUNSD-r	F1	80.4	TPP (LayoutLMv3)
Named Entity Recognition (NER)	FUNSD-r	F1	78.19	TPP (LayoutMask)
Named Entity Recognition (NER)	CORD-r	F1	91.85	TPP (LayoutLMv3)
Named Entity Recognition (NER)	CORD-r	F1	89.34	TPP (LayoutMask)
Semantic entity labeling	FUNSD	F1	85.16	TPP (LayoutMask)
Key Information Extraction	CORD	F1	96.92	TPP (LayoutMask)
Key Information Extraction	RFUND-EN	key-value pair F1	50.27	TPP (LayoutLMv3_base)
Reading Order Detection	ROOR	Segment-level F1	42.96	TPP (LayoutLMv3-base)
Reading Order Detection	ReadingBank	Average Page-level BLEU	98.16	TPP (LayoutMask)
Reading Order Detection	ReadingBank	Average Relative Distance (ARD)	0.37	TPP (LayoutMask)

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Abstract

Results

Related Papers

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Abstract

Results

Related Papers