TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Prompt Guided Transformer for Multi-Task Dense Prediction

Prompt Guided Transformer for Multi-Task Dense Prediction

Yuxiang Lu, Shalayiding Sirejiding, Yue Ding, Chunlin Wang, Hongtao Lu

2023-07-28Surface Normal EstimationSemantic SegmentationPredictionMulti-Task LearningBoundary DetectionMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2RMSE0.5468PGT (Swin-S)
Depth EstimationNYU-Depth V2RMSE0.59PGT (Swin-T)
Boundary DetectionNYU-Depth V2odsF78.04PGT (Swin-S)
Boundary DetectionNYU-Depth V2odsF77.05PGT (Swin-T)
Semantic SegmentationNYU Depth v2Mean IoU46.43PGT (Swin-S)
Semantic SegmentationNYU Depth v2Mean IoU41.61PGT (Swin-T)
3DNYU-Depth V2RMSE0.5468PGT (Swin-S)
3DNYU-Depth V2RMSE0.59PGT (Swin-T)
10-shot image generationNYU Depth v2Mean IoU46.43PGT (Swin-S)
10-shot image generationNYU Depth v2Mean IoU41.61PGT (Swin-T)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Multi-Strategy Improved Snake Optimizer Accelerated CNN-LSTM-Attention-Adaboost for Trajectory Prediction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16