Nicolas Gonthier, Saïd Ladjal, Yann Gousseau
Weakly supervised object detection (WSOD) using only image-level annotations has attracted a growing attention over the past few years. Whereas such task is typically addressed with a domain-specific solution focused on natural images, we show that a simple multiple instance approach applied on pre-trained deep features yields excellent performances on non-photographic datasets, possibly including new classes. The approach does not include any fine-tuning or cross-domain learning and is therefore efficient and possibly applicable to arbitrary datasets and classes. We investigate several flavors of the proposed approach, some including multi-layers perceptron and polyhedral classifiers. Despite its simplicity, our method shows competitive results on a range of publicly available datasets, including paintings (People-Art, IconArt), watercolors, cliparts and comics and allows to quickly learn unseen visual categories.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | Comic2k | MAP | 27 | MI-max |
| Object Detection | CASPAPaintings | Mean mAP | 16.2 | MI-max |
| Object Detection | IconArt | MAP | 15.1 | MI_Net [wang_revisiting_2018] |
| Object Detection | Watercolor2k | MAP | 49.5 | MI-max |
| Object Detection | Clipart1k | MAP | 38.4 | MI-max |
| Object Detection | PeopleArt | MAP | 58.3 | Polyhedral MI-max |
| 3D | Comic2k | MAP | 27 | MI-max |
| 3D | CASPAPaintings | Mean mAP | 16.2 | MI-max |
| 3D | IconArt | MAP | 15.1 | MI_Net [wang_revisiting_2018] |
| 3D | Watercolor2k | MAP | 49.5 | MI-max |
| 3D | Clipart1k | MAP | 38.4 | MI-max |
| 3D | PeopleArt | MAP | 58.3 | Polyhedral MI-max |
| 2D Classification | Comic2k | MAP | 27 | MI-max |
| 2D Classification | CASPAPaintings | Mean mAP | 16.2 | MI-max |
| 2D Classification | IconArt | MAP | 15.1 | MI_Net [wang_revisiting_2018] |
| 2D Classification | Watercolor2k | MAP | 49.5 | MI-max |
| 2D Classification | Clipart1k | MAP | 38.4 | MI-max |
| 2D Classification | PeopleArt | MAP | 58.3 | Polyhedral MI-max |
| 2D Object Detection | Comic2k | MAP | 27 | MI-max |
| 2D Object Detection | CASPAPaintings | Mean mAP | 16.2 | MI-max |
| 2D Object Detection | IconArt | MAP | 15.1 | MI_Net [wang_revisiting_2018] |
| 2D Object Detection | Watercolor2k | MAP | 49.5 | MI-max |
| 2D Object Detection | Clipart1k | MAP | 38.4 | MI-max |
| 2D Object Detection | PeopleArt | MAP | 58.3 | Polyhedral MI-max |
| 16k | Comic2k | MAP | 27 | MI-max |
| 16k | CASPAPaintings | Mean mAP | 16.2 | MI-max |
| 16k | IconArt | MAP | 15.1 | MI_Net [wang_revisiting_2018] |
| 16k | Watercolor2k | MAP | 49.5 | MI-max |
| 16k | Clipart1k | MAP | 38.4 | MI-max |
| 16k | PeopleArt | MAP | 58.3 | Polyhedral MI-max |