Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, Muhammad Zeshan Afzal

2023-06-23Document Layout Analysis object-detection Object Detection

Abstract

This paper takes an important step in bridging the performance gap between DETR and R-CNN for graphical object detection. Existing graphical object detection approaches have enjoyed recent enhancements in CNN-based object detection methods, achieving remarkable progress. Recently, Transformer-based detectors have considerably boosted the generic object detection performance, eliminating the need for hand-crafted features or post-processing steps such as Non-Maximum Suppression (NMS) using object queries. However, the effectiveness of such enhanced transformer-based detection algorithms has yet to be verified for the problem of graphical object detection. Essentially, inspired by the latest advancements in the DETR, we employ the existing detection transformer with few modifications for graphical object detection. We modify object queries in different ways, using points, anchor boxes and adding positive and negative noise to the anchors to boost performance. These modifications allow for better handling of objects with varying sizes and aspect ratios, more robustness to small variations in object positions and sizes, and improved image discrimination between objects and non-objects. We evaluate our approach on the four graphical datasets: PubTables, TableBank, NTable and PubLaynet. Upon integrating query modifications in the DETR, we outperform prior works and achieve new state-of-the-art results with the mAP of 96.9\%, 95.7\% and 99.3\% on TableBank, PubLaynet, PubTables, respectively. The results from extensive ablations show that transformer-based methods are more effective for document analysis analogous to other applications. We hope this study draws more attention to the research of using detection transformers in document image analysis.

Results

Task	Dataset	Metric	Value	Model
Document Layout Analysis	PubLayNet val	Figure	0.975	DETR
Document Layout Analysis	PubLayNet val	List	0.964	DETR
Document Layout Analysis	PubLayNet val	Overall	0.957	DETR
Document Layout Analysis	PubLayNet val	Table	0.981	DETR
Document Layout Analysis	PubLayNet val	Text	0.947	DETR
Document Layout Analysis	PubLayNet val	Title	0.918	DETR

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

Abstract

Results

Related Papers

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

Abstract

Results

Related Papers