TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	NYU Depth v2	PGT (Swin-S)	Mean IoU	46.43	# 78
Semantic Segmentation	NYU Depth v2	PGT (Swin-T)	Mean IoU	41.61	# 94
Surface Normal Estimation	NYU-Depth V2	PGT (Swin-S)	Mean Angle Error	19.24	# 3
Boundary Detection	NYU-Depth V2	PGT (Swin-S)	odsF	78.04	# 2
Boundary Detection	NYU-Depth V2	PGT (Swin-T)	odsF	77.05	# 3
Monocular Depth Estimation	NYU-Depth V2	PGT (Swin-T)	RMSE	0.59	# 69
Surface Normal Estimation	NYU-Depth V2	PGT (Swin-T)	Mean Angle Error	20.06	# 4
Monocular Depth Estimation	NYU-Depth V2	PGT (Swin-S)	RMSE	0.5468	# 64

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-guided-transformer-for-multi-task/boundary-detection-on-nyu-depth-v2)](https://paperswithcode.com/sota/boundary-detection-on-nyu-depth-v2?p=prompt-guided-transformer-for-multi-task)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-guided-transformer-for-multi-task/surface-normal-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/surface-normal-estimation-on-nyu-depth-v2?p=prompt-guided-transformer-for-multi-task)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-guided-transformer-for-multi-task/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=prompt-guided-transformer-for-multi-task)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/prompt-guided-transformer-for-multi-task/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=prompt-guided-transformer-for-multi-task)`

Prompt Guided Transformer for Multi-Task Dense Prediction

28 Jul 2023 · Yuxiang Lu, Shalayiding Sirejiding, Yue Ding, Chunlin Wang, Hongtao Lu ·

Task-conditional architecture offers advantage in parameter efficiency but falls short in performance compared to state-of-the-art multi-decoder methods. How to trade off performance and model parameters is an important and difficult problem. In this paper, we introduce a simple and lightweight task-conditional model called Prompt Guided Transformer (PGT) to optimize this challenge. Our approach designs a Prompt-conditioned Transformer block, which incorporates task-specific prompts in the self-attention mechanism to achieve global dependency modeling and parameter-efficient feature adaptation across multiple tasks. This block is integrated into both the shared encoder and decoder, enhancing the capture of intra- and inter-task features. Moreover, we design a lightweight decoder to further reduce parameter usage, which accounts for only 2.7% of the total model parameters. Extensive experiments on two multi-task dense prediction benchmarks, PASCAL-Context and NYUD-v2, demonstrate that our approach achieves state-of-the-art results among task-conditional methods while using fewer parameters, and maintains a significant balance between performance and parameter size.

PDF Abstract

Code

Add Remove Mark official

innovator-zero/MTDP_Lib official

Tasks

Add Remove

Boundary Detection

Decoder

Monocular Depth Estimation

Multi-Task Learning

Semantic Segmentation

Surface Normal Estimation

Datasets

NYUv2

Results from the Paper

Edit

Ranked #2 on Boundary Detection on NYU-Depth V2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	NYU Depth v2	PGT (Swin-S)	Mean IoU	46.43	# 78	Compare
Semantic Segmentation	NYU Depth v2	PGT (Swin-T)	Mean IoU	41.61	# 94	Compare
Surface Normal Estimation	NYU-Depth V2	PGT (Swin-S)	Mean Angle Error	19.24	# 3	Compare
Boundary Detection	NYU-Depth V2	PGT (Swin-S)	odsF	78.04	# 2	Compare
Boundary Detection	NYU-Depth V2	PGT (Swin-T)	odsF	77.05	# 3	Compare
Monocular Depth Estimation	NYU-Depth V2	PGT (Swin-T)	RMSE	0.59	# 69	Compare
Surface Normal Estimation	NYU-Depth V2	PGT (Swin-T)	Mean Angle Error	20.06	# 4	Compare
Monocular Depth Estimation	NYU-Depth V2	PGT (Swin-S)	RMSE	0.5468	# 64	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Prompt Guided Transformer for Multi-Task Dense Prediction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove