TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DUET: Cross-modal Semantic Grounding for Contrastive Zero-...

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Zhuo Chen, Yufeng Huang, Jiaoyan Chen, Yuxia Geng, Wen Zhang, Yin Fang, Jeff Z. Pan, Huajun Chen

2022-07-04Image ClassificationAttributeZero-Shot Image ClassificationMulti-Task LearningContrastive LearningZero-Shot Learning
PaperPDFCode(official)Code

Abstract

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

Results

TaskDatasetMetricValueModel
Zero-Shot LearningCUB-200-2011Accuracy Seen72.8DUET
Zero-Shot LearningCUB-200-2011Accuracy Unseen62.9DUET
Zero-Shot LearningCUB-200-2011H67.5DUET
Zero-Shot LearningCUB-200-2011average top-1 classification accuracy72.3DUET
Zero-Shot LearningAwA2Accuracy Seen84.7DUET (Ours)
Zero-Shot LearningAwA2Accuracy Unseen63.7DUET (Ours)
Zero-Shot LearningAwA2H72.7DUET (Ours)
Zero-Shot LearningAwA2average top-1 classification accuracy69.9DUET (Ours)
Zero-Shot LearningSUN AttributeAccuracy Seen45.8DUET (Ours)
Zero-Shot LearningSUN AttributeAccuracy Unseen45.7DUET (Ours)
Zero-Shot LearningSUN AttributeH45.8DUET (Ours)
Zero-Shot LearningSUN Attributeaverage top-1 classification accuracy64.4DUET (Ours)

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17