TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MDS-ViTNet: Improving saliency prediction for Eye-Tracking...

MDS-ViTNet: Improving saliency prediction for Eye-Tracking with Vision Transformer

Polezhaev Ignat, Goncharenko Igor, Iurina Natalya

2024-05-29MarketingSaliency PredictionTransfer Learning
PaperPDFCode(official)

Abstract

In this paper, we present a novel methodology we call MDS-ViTNet (Multi Decoder Saliency by Vision Transformer Network) for enhancing visual saliency prediction or eye-tracking. This approach holds significant potential for diverse fields, including marketing, medicine, robotics, and retail. We propose a network architecture that leverages the Vision Transformer, moving beyond the conventional ImageNet backbone. The framework adopts an encoder-decoder structure, with the encoder utilizing a Swin transformer to efficiently embed most important features. This process involves a Transfer Learning method, wherein layers from the Vision Transformer are converted by the Encoder Transformer and seamlessly integrated into a CNN Decoder. This methodology ensures minimal information loss from the original input image. The decoder employs a multi-decoding technique, utilizing dual decoders to generate two distinct attention maps. These maps are subsequently combined into a singular output via an additional CNN model. Our trained model MDS-ViTNet achieves state-of-the-art results across several benchmarks. Committed to fostering further collaboration, we intend to make our code, models, and datasets accessible to the public.

Results

TaskDatasetMetricValueModel
Saliency DetectionSALICONAUC0.8684MDS-ViTNet
Saliency DetectionSALICONCC0.898MDS-ViTNet
Saliency DetectionSALICONKLD0.2127MDS-ViTNet
Saliency DetectionSALICONSIM0.7887MDS-ViTNet
Saliency PredictionSALICONAUC0.8684MDS-ViTNet
Saliency PredictionSALICONCC0.898MDS-ViTNet
Saliency PredictionSALICONKLD0.2127MDS-ViTNet
Saliency PredictionSALICONSIM0.7887MDS-ViTNet
Few-Shot Transfer Learning for Saliency PredictionSALICONAUC0.8684MDS-ViTNet
Few-Shot Transfer Learning for Saliency PredictionSALICONCC0.898MDS-ViTNet
Few-Shot Transfer Learning for Saliency PredictionSALICONKLD0.2127MDS-ViTNet
Few-Shot Transfer Learning for Saliency PredictionSALICONSIM0.7887MDS-ViTNet

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation2025-07-15Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15Robust-Multi-Task Gradient Boosting2025-07-15Calibrated and Robust Foundation Models for Vision-Language and Medical Image Tasks Under Distribution Shift2025-07-12The Bayesian Approach to Continual Learning: An Overview2025-07-11