Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra
We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in an image and then learns a detector on these masks using our robust loss function. We further improve the performance by self-training the model on its predictions. Compared to prior work, CutLER is simpler, compatible with different detection architectures, and detects multiple objects. CutLER is also a zero-shot unsupervised detector and improves detection performance AP50 by over 2.7 times on 11 benchmarks across domains like video frames, paintings, sketches, etc. With finetuning, CutLER serves as a low-shot detector surpassing MoCo-v2 by 7.3% APbox and 6.6% APmask on COCO when training with 5% labels.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Unsupervised Instance Segmentation | COCO val2017 | AP | 9.2 | CutLER (Cascade+DINO) |
| Unsupervised Instance Segmentation | COCO val2017 | AP50 | 18.9 | CutLER (Cascade+DINO) |
| Unsupervised Instance Segmentation | COCO val2017 | AP75 | 9.7 | CutLER (Cascade+DINO) |
| Unsupervised Instance Segmentation | UVO | AP | 10.1 | CutLER (Cascade+DINO) |
| Unsupervised Instance Segmentation | UVO | AP50 | 22.8 | CutLER (Cascade+DINO) |
| Unsupervised Instance Segmentation | UVO | AP75 | 8 | CutLER (Cascade+DINO) |
| Unsupervised Instance Segmentation | COCO val2017 | AP | 5.3 | CutLER |
| Unsupervised Instance Segmentation | COCO val2017 | AP50 | 8.6 | CutLER |
| Unsupervised Instance Segmentation | COCO val2017 | AP75 | 5.5 | CutLER |
| Unsupervised Instance Segmentation | COCO val2017 | AR100 | 9.3 | CutLER |
| Unsupervised Panoptic Segmentation | COCO val2017 | PQ | 12.4 | CutLER+STEGO |
| Unsupervised Panoptic Segmentation | COCO val2017 | RQ | 15.2 | CutLER+STEGO |
| Unsupervised Panoptic Segmentation | COCO val2017 | SQ | 36.1 | CutLER+STEGO |
| 2D Panoptic Segmentation | COCO val2017 | PQ | 12.4 | CutLER+STEGO |
| 2D Panoptic Segmentation | COCO val2017 | RQ | 15.2 | CutLER+STEGO |
| 2D Panoptic Segmentation | COCO val2017 | SQ | 36.1 | CutLER+STEGO |