Eran Goldman, Roei Herzig, Aviv Eisenschtat, Oria Ratzon, Itsik Levi, Jacob Goldberger, Tal Hassner
Man-made scenes can be densely packed, containing numerous objects, often identical, positioned in close proximity. We show that precise object detection in such scenes remains a challenging frontier even for state-of-the-art object detectors. We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings. Our contributions include: (1) A layer for estimating the Jaccard index as a detection quality score; (2) a novel EM merging unit, which uses our quality scores to resolve detection overlap ambiguities; finally, (3) an extensive, annotated data set, SKU-110K, representing packed retail environments, released for training and testing under such extreme settings. Detection tests on SKU-110K and counting tests on the CARPK and PUCPR+ show our method to outperform existing state-of-the-art with substantial margins. The code and data will be made available on \url{www.github.com/eg4000/SKU110K_CVPR19}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Counting | CARPK | MAE | 6.77 | Soft-IoU + EM-Merger unit |
| Object Counting | CARPK | RMSE | 8.52 | Soft-IoU + EM-Merger unit |
| Object Detection | SKU-110K | AP | 49.2 | Soft-IoU + EM-Merger unit |
| 3D | SKU-110K | AP | 49.2 | Soft-IoU + EM-Merger unit |
| 2D Classification | SKU-110K | AP | 49.2 | Soft-IoU + EM-Merger unit |
| 2D Object Detection | SKU-110K | AP | 49.2 | Soft-IoU + EM-Merger unit |
| 16k | SKU-110K | AP | 49.2 | Soft-IoU + EM-Merger unit |