TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Efficient Document Image Classification Using Region-Based...

Efficient Document Image Classification Using Region-Based Graph Neural Network

Jaya Krishna Mandivarapu, Eric Bunch, Qian You, Glenn Fung

2021-06-25Image ClassificationDocument Image ClassificationDocument ClassificationClassification
PaperPDF

Abstract

Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources which could defeat the cost-saving advantages of automatic document image classification. In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document. We have rigorously benchmarked our proposed algorithm against several state-of-art vision and language models on both publicly available dataset and a real-life insurance document classification dataset. Empirical results on both publicly available and real-world data show that our methods achieve near SOTA performance yet require much less computing resources and time for model training and inference. This results in solutions than offer better cost advantages, especially in scalable deployment for enterprise applications. The results showed that our algorithm can achieve classification performance quite close to SOTA. We also provide comprehensive comparisons of computing resources, model sizes, train and inference time between our proposed methods and baselines. In addition we delineate the cost per image using our method and other baselines.

Results

TaskDatasetMetricValueModel
Document Image ClassificationTobacco-3482Accuracy91.95DocBert [DOCBERT]
Document Image ClassificationTobacco-3482Accuracy91Eff-GNN + Word2Vec [word2vec]
Document Image ClassificationTobacco-3482Accuracy82.3DocBERT [DOCBERT]
Document Image ClassificationTobacco-3482Accuracy79BERT [BERT]
Document Image ClassificationTobacco-3482Accuracy77.5Eff-GNN + Word2Vec [word2vec] + Image Embedding
Document Image ClassificationTobacco-3482Accuracy73.5Eff-GNN+ Word2Vec [word2vec]
Document Image ClassificationTobacco-3482Memory7.08VGG
Image ClassificationTobacco-3482Accuracy91.95DocBert [DOCBERT]
Image ClassificationTobacco-3482Accuracy91Eff-GNN + Word2Vec [word2vec]
Image ClassificationTobacco-3482Accuracy82.3DocBERT [DOCBERT]
Image ClassificationTobacco-3482Accuracy79BERT [BERT]
Image ClassificationTobacco-3482Accuracy77.5Eff-GNN + Word2Vec [word2vec] + Image Embedding
Image ClassificationTobacco-3482Accuracy73.5Eff-GNN+ Word2Vec [word2vec]
Image ClassificationTobacco-3482Memory7.08VGG

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15