TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-modal Cycle-consistent Generalized Zero-Shot Learning

Multi-modal Cycle-consistent Generalized Zero-Shot Learning

Rafael Felix, B. G. Vijay Kumar, Ian Reid, Gustavo Carneiro

2018-08-01ECCV 2018 9Generalized Zero-Shot LearningGeneral ClassificationZero-Shot Learning
PaperPDFCode(official)

Abstract

In generalized zero shot learning (GZSL), the set of classes are split into seen and unseen classes, where training relies on the semantic features of the seen and unseen classes and the visual representations of only the seen classes, while testing uses the visual representations of the seen and unseen classes. Current methods address GZSL by learning a transformation from the visual to the semantic space, exploring the assumption that the distribution of classes in the semantic and visual spaces is relatively similar. Such methods tend to transform unseen testing visual representations into one of the seen classes' semantic features instead of the semantic features of the correct unseen class, resulting in low accuracy GZSL classification. Recently, generative adversarial networks (GAN) have been explored to synthesize visual representations of the unseen classes from their semantic features - the synthesized representations of the seen and unseen classes are then used to train the GZSL classifier. This approach has been shown to boost GZSL classification accuracy, however, there is no guarantee that synthetic visual representations can generate back their semantic feature in a multi-modal cycle-consistent manner. This constraint can result in synthetic visual representations that do not represent well their semantic features. In this paper, we propose the use of such constraint based on a new regularization for the GAN training that forces the generated visual features to reconstruct their original semantic features. Once our model is trained with this multi-modal cycle-consistent semantic compatibility, we can then synthesize more representative visual representations for the seen and, more importantly, for the unseen classes. Our proposed approach shows the best GZSL classification results in the field in several publicly available datasets.

Results

TaskDatasetMetricValueModel
Zero-Shot LearningCUB-200-2011average top-1 classification accuracy58.6Cycle-WGAN
Zero-Shot LearningSUN Attributeaverage top-1 classification accuracy59.9Cycle-WGAN
Zero-Shot LearningSUN AttributeHarmonic mean39.4Cycle-WGAN

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning2025-06-26Zero-Shot Learning for Obsolescence Risk Forecasting2025-06-26SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25A Multi-Scale Spatial Attention-Based Zero-Shot Learning Framework for Low-Light Image Enhancement2025-06-23Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation2025-06-20AnyTraverse: An off-road traversability framework with VLM and human operator in the loop2025-06-20