ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

Carlos Hinojosa, Shuming Liu, Bernard Ghanem

2024-07-17Image Classification Semantic Segmentation Instance Segmentation Object Detection

Abstract

Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework, offering remarkable performance across a wide range of downstream tasks. To increase the difficulty of the pretext task and learn richer visual representations, existing works have focused on replacing standard random masking with more sophisticated strategies, such as adversarial-guided and teacher-guided masking. However, these strategies depend on the input data thus commonly increasing the model complexity and requiring additional calculations to generate the mask patterns. This raises the question: Can we enhance MAE performance beyond random masking without relying on input data or incurring additional computational costs? In this work, we introduce a simple yet effective data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise. Drawing inspiration from color noise in image processing, we explore four types of filters to yield mask patterns with different spatial and semantic priors. ColorMAE requires no additional learnable parameters or computational overhead in the network, yet it significantly enhances the learned representations. We provide a comprehensive empirical evaluation, demonstrating our strategy's superiority in downstream tasks compared to random masking. Notably, we report an improvement of 2.72 in mIoU in semantic segmentation tasks relative to baseline MAE implementations.

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	ADE20K	Validation mIoU	49.3	ColorMAE-Green-ViTB-1600
Object Detection	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
Object Detection	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
Object Detection	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
3D	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
3D	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
3D	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
Instance Segmentation	COCO	maskAP	44.4	ColorMAE-Green-ViTB-1600
Instance Segmentation	COCO	maskAP50	67.8	ColorMAE-Green-ViTB-1600
Instance Segmentation	COCO	maskAP75	48	ColorMAE-Green-ViTB-1600
2D Classification	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
2D Classification	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
2D Classification	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
2D Object Detection	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
2D Object Detection	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
2D Object Detection	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
10-shot image generation	ADE20K	Validation mIoU	49.3	ColorMAE-Green-ViTB-1600
16k	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
16k	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
16k	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600

Abstract

Results

Task	Dataset	Metric	Value	Model
Semantic Segmentation	ADE20K	Validation mIoU	49.3	ColorMAE-Green-ViTB-1600
Object Detection	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
Object Detection	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
Object Detection	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
3D	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
3D	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
3D	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
Instance Segmentation	COCO	maskAP	44.4	ColorMAE-Green-ViTB-1600
Instance Segmentation	COCO	maskAP50	67.8	ColorMAE-Green-ViTB-1600
Instance Segmentation	COCO	maskAP75	48	ColorMAE-Green-ViTB-1600
2D Classification	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
2D Classification	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
2D Classification	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
2D Object Detection	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
2D Object Detection	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
2D Object Detection	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600
10-shot image generation	ADE20K	Validation mIoU	49.3	ColorMAE-Green-ViTB-1600
16k	COCO	boxAP	50.1	ColorMAE-Green-ViTB-1600
16k	COCO	boxAP50	70.7	ColorMAE-Green-ViTB-1600
16k	COCO	boxAP75	54.7	ColorMAE-Green-ViTB-1600

ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

Abstract

Results

Related Papers

ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

Abstract

Results

Related Papers