X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

2022-12-07Data Augmentation Segmentation Semantic Segmentation Open Vocabulary Object Detection Instance Segmentation Zero-Shot Learning Object Detection

Paper PDF Code(official)Code

Abstract

Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code and models are available at https://github.com/yoctta/XPaste.

Results

Task	Dataset	Metric	Value	Model
Object Detection	LVIS v1.0 val	box AP	50.9	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
Object Detection	LVIS v1.0 val	box APr	48.7	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
Object Detection	LVIS v1.0	AP novel-LVIS base training	21.4	X-Paste
Object Detection	LVIS v1.0	AP novel-Unrestricted open-vocabulary training	22.8	X-Paste
3D	LVIS v1.0 val	box AP	50.9	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
3D	LVIS v1.0 val	box APr	48.7	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
3D	LVIS v1.0	AP novel-LVIS base training	21.4	X-Paste
3D	LVIS v1.0	AP novel-Unrestricted open-vocabulary training	22.8	X-Paste
Instance Segmentation	COCO minival	mask AP	48.8	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
Instance Segmentation	LVIS v1.0 val	mask AP	45.4	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
Instance Segmentation	LVIS v1.0 val	mask APr	43.8	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
2D Classification	LVIS v1.0 val	box AP	50.9	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
2D Classification	LVIS v1.0 val	box APr	48.7	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
2D Classification	LVIS v1.0	AP novel-LVIS base training	21.4	X-Paste
2D Classification	LVIS v1.0	AP novel-Unrestricted open-vocabulary training	22.8	X-Paste
2D Object Detection	LVIS v1.0 val	box AP	50.9	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
2D Object Detection	LVIS v1.0 val	box APr	48.7	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
2D Object Detection	LVIS v1.0	AP novel-LVIS base training	21.4	X-Paste
2D Object Detection	LVIS v1.0	AP novel-Unrestricted open-vocabulary training	22.8	X-Paste
Open Vocabulary Object Detection	LVIS v1.0	AP novel-LVIS base training	21.4	X-Paste
Open Vocabulary Object Detection	LVIS v1.0	AP novel-Unrestricted open-vocabulary training	22.8	X-Paste
16k	LVIS v1.0 val	box AP	50.9	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
16k	LVIS v1.0 val	box APr	48.7	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
16k	LVIS v1.0	AP novel-LVIS base training	21.4	X-Paste
16k	LVIS v1.0	AP novel-Unrestricted open-vocabulary training	22.8	X-Paste

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Abstract

Results

Related Papers

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Abstract

Results

Related Papers