TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Visual Saliency Transformer

Visual Saliency Transformer

Nian Liu, Ni Zhang, Kaiyuan Wan, Ling Shao, Junwei Han

2021-04-25ICCV 2021 10Thermal Image SegmentationBoundary DetectionSalient Object DetectionRGB-D Salient Object Detectionobject-detectionObject DetectionSaliency Detection
PaperPDFCodeCode

Abstract

Existing state-of-the-art saliency detection methods heavily rely on CNN-based architectures. Alternatively, we rethink this task from a convolution-free sequence-to-sequence perspective and predict saliency by modeling long-range dependencies, which can not be achieved by convolution. Specifically, we develop a novel unified model based on a pure transformer, namely, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD). It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Unlike conventional architectures used in Vision Transformer (ViT), we leverage multi-level token fusion and propose a new token upsampling method under the transformer framework to get high-resolution detection results. We also develop a token-based multi-task decoder to simultaneously perform saliency and boundary detection by introducing task-related tokens and a novel patch-task-attention mechanism. Experimental results show that our model outperforms existing methods on both RGB and RGB-D SOD benchmark datasets. Most importantly, our whole framework not only provides a new perspective for the SOD field but also shows a new paradigm for transformer-based dense prediction models. Code is available at https://github.com/nnizhang/VST.

Results

TaskDatasetMetricValueModel
Semantic SegmentationRGB-T-Glass-SegmentationMAE0.044VST
Object DetectionSIPAverage MAE0.04VST
Object DetectionSIPS-Measure90.4VST
Object DetectionSIPmax E-Measure94.4VST
Object DetectionSIPmax F-Measure91.5VST
Object DetectionNJUDS-Measure0.922VST
Object DetectionNLPRS-Measure0.932VST
3DSIPAverage MAE0.04VST
3DSIPS-Measure90.4VST
3DSIPmax E-Measure94.4VST
3DSIPmax F-Measure91.5VST
3DNJUDS-Measure0.922VST
3DNLPRS-Measure0.932VST
2D ClassificationSIPAverage MAE0.04VST
2D ClassificationSIPS-Measure90.4VST
2D ClassificationSIPmax E-Measure94.4VST
2D ClassificationSIPmax F-Measure91.5VST
2D ClassificationNJUDS-Measure0.922VST
2D ClassificationNLPRS-Measure0.932VST
Scene SegmentationRGB-T-Glass-SegmentationMAE0.044VST
2D Object DetectionSIPAverage MAE0.04VST
2D Object DetectionSIPS-Measure90.4VST
2D Object DetectionSIPmax E-Measure94.4VST
2D Object DetectionSIPmax F-Measure91.5VST
2D Object DetectionNJUDS-Measure0.922VST
2D Object DetectionNLPRS-Measure0.932VST
2D Object DetectionRGB-T-Glass-SegmentationMAE0.044VST
10-shot image generationRGB-T-Glass-SegmentationMAE0.044VST
16kSIPAverage MAE0.04VST
16kSIPS-Measure90.4VST
16kSIPmax E-Measure94.4VST
16kSIPmax F-Measure91.5VST
16kNJUDS-Measure0.922VST
16kNLPRS-Measure0.932VST

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09