Prompt Guided Transformer for Multi-Task Dense Prediction

Yuxiang Lu, Shalayiding Sirejiding, Yue Ding, Chunlin Wang, Hongtao Lu

2023-07-28Surface Normal Estimation Semantic Segmentation Prediction Multi-Task Learning Boundary Detection Monocular Depth Estimation

Paper PDF Code(official)

Abstract

Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	RMSE	0.5468	PGT (Swin-S)
Depth Estimation	NYU-Depth V2	RMSE	0.59	PGT (Swin-T)
Boundary Detection	NYU-Depth V2	odsF	78.04	PGT (Swin-S)
Boundary Detection	NYU-Depth V2	odsF	77.05	PGT (Swin-T)
Semantic Segmentation	NYU Depth v2	Mean IoU	46.43	PGT (Swin-S)
Semantic Segmentation	NYU Depth v2	Mean IoU	41.61	PGT (Swin-T)
3D	NYU-Depth V2	RMSE	0.5468	PGT (Swin-S)
3D	NYU-Depth V2	RMSE	0.59	PGT (Swin-T)
10-shot image generation	NYU Depth v2	Mean IoU	46.43	PGT (Swin-S)
10-shot image generation	NYU Depth v2	Mean IoU	41.61	PGT (Swin-T)

Prompt Guided Transformer for Multi-Task Dense Prediction

Abstract

Results

Related Papers

Prompt Guided Transformer for Multi-Task Dense Prediction

Abstract

Results

Related Papers