Zhaohui Yang, Miaojing Shi, Chao Xu, Vittorio Ferrari, Yannis Avrithis
Weakly-supervised object detection attempts to limit the amount of supervision by dispensing the need for bounding boxes, but still assumes image-level labels on the entire training set. In this work, we study the problem of training an object detector from one or few images with image-level labels and a larger set of completely unlabeled images. This is an extreme case of semi-supervised learning where the labeled data are not enough to bootstrap the learning of a detector. Our solution is to train a weakly-supervised student detector model from image-level pseudo-labels generated on the unlabeled set by a teacher classifier model, bootstrapped by region-level similarities to labeled images. Building upon the recent representative weakly-supervised pipeline PCL, our method can use more unlabeled images to achieve performance competitive or superior to many recent weakly-supervised detection solutions.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | PASCAL VOC 2007 | MAP | 38 | NSOD |
| Object Detection | PASCAL VOC 2012 test | MAP | 36.6 | NSOD |
| 3D | PASCAL VOC 2007 | MAP | 38 | NSOD |
| 3D | PASCAL VOC 2012 test | MAP | 36.6 | NSOD |
| 2D Classification | PASCAL VOC 2007 | MAP | 38 | NSOD |
| 2D Classification | PASCAL VOC 2012 test | MAP | 36.6 | NSOD |
| 2D Object Detection | PASCAL VOC 2007 | MAP | 38 | NSOD |
| 2D Object Detection | PASCAL VOC 2012 test | MAP | 36.6 | NSOD |
| 16k | PASCAL VOC 2007 | MAP | 38 | NSOD |
| 16k | PASCAL VOC 2012 test | MAP | 36.6 | NSOD |