PolyMaX: General Dense Prediction with Mask Transformer

Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen

2023-11-09Surface Normals Estimation Surface Normal Estimation Semantic Segmentation Prediction Depth Estimation Monocular Depth Estimation

Paper PDF Code(official)

Abstract

Dense prediction tasks, such as semantic segmentation, depth estimation, and surface normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or regression (continuous outputs). This per-pixel prediction paradigm has remained popular due to the prevalence of fully convolutional networks. However, on the recent frontier of segmentation task, the community has been witnessing a shift of paradigm from per-pixel prediction to cluster-prediction with the emergence of transformer architectures, particularly the mask transformers, which directly predicts a label for a mask instead of a pixel. Despite this shift, methods based on the per-pixel prediction paradigm still dominate the benchmarks on the other dense prediction tasks that require continuous outputs, such as depth estimation and surface normal prediction. Motivated by the success of DORN and AdaBins in depth estimation, achieved by discretizing the continuous output space, we propose to generalize the cluster-prediction based method to general dense prediction tasks. This allows us to unify dense prediction tasks with the mask transformer framework. Remarkably, the resulting model PolyMaX demonstrates state-of-the-art performance on three benchmarks of NYUD-v2 dataset. We hope our simple yet effective design can inspire more research on exploiting mask transformers for more dense prediction tasks. Code and model will be made available.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	Delta < 1.25	0.969	PolyMaX(ConvNeXt-L)
Depth Estimation	NYU-Depth V2	Delta < 1.25^2	0.9958	PolyMaX(ConvNeXt-L)
Depth Estimation	NYU-Depth V2	Delta < 1.25^3	0.999	PolyMaX(ConvNeXt-L)
Depth Estimation	NYU-Depth V2	RMSE	0.25	PolyMaX(ConvNeXt-L)
Depth Estimation	NYU-Depth V2	absolute relative error	0.067	PolyMaX(ConvNeXt-L)
Depth Estimation	NYU-Depth V2	log 10	0.029	PolyMaX(ConvNeXt-L)
3D	NYU-Depth V2	Delta < 1.25	0.969	PolyMaX(ConvNeXt-L)
3D	NYU-Depth V2	Delta < 1.25^2	0.9958	PolyMaX(ConvNeXt-L)
3D	NYU-Depth V2	Delta < 1.25^3	0.999	PolyMaX(ConvNeXt-L)
3D	NYU-Depth V2	RMSE	0.25	PolyMaX(ConvNeXt-L)
3D	NYU-Depth V2	absolute relative error	0.067	PolyMaX(ConvNeXt-L)
3D	NYU-Depth V2	log 10	0.029	PolyMaX(ConvNeXt-L)
Surface Normals Estimation	NYU Depth v2	% < 11.25	65.66	PolyMaX(ConvNeXt-L)
Surface Normals Estimation	NYU Depth v2	% < 22.5	82.28	PolyMaX(ConvNeXt-L)
Surface Normals Estimation	NYU Depth v2	% < 30	87.83	PolyMaX(ConvNeXt-L)
Surface Normals Estimation	NYU Depth v2	Mean Angle Error	13.09	PolyMaX(ConvNeXt-L)
Surface Normals Estimation	NYU Depth v2	RMSE	20.4	PolyMaX(ConvNeXt-L)

PolyMaX: General Dense Prediction with Mask Transformer

Abstract

Results

Related Papers

PolyMaX: General Dense Prediction with Mask Transformer

Abstract

Results

Related Papers