TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CAILA: Concept-Aware Intra-Layer Adapters for Compositiona...

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

Zhaoheng Zheng, Haidong Zhu, Ram Nevatia

2023-05-26AttributeZero-Shot LearningCompositional Zero-Shot Learning
PaperPDFCode(official)Code

Abstract

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. Recent researchers focus on applying large-scale Vision-Language Pre-trained (VLP) models like CLIP with strong generalization ability. However, these methods treat the pre-trained model as a black box and focus on pre- and post-CLIP operations, which do not inherently mine the semantic concept between the layers inside CLIP. We propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted. We assess our method on four popular CZSL datasets, MIT-States, C-GQA, UT-Zappos, and VAW-CZSL, which shows state-of-the-art performance compared to existing methods on all of them.

Results

TaskDatasetMetricValueModel
Zero-Shot LearningMIT-States, generalized splitH-Mean39.9CAILA
Zero-Shot LearningMIT-States, generalized splitSeen accuracy51CAILA
Zero-Shot LearningMIT-States, generalized splitTest AUC top 123.4CAILA
Zero-Shot LearningMIT-States, generalized splitUnseen accuracy53.9CAILA

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Non-Adaptive Adversarial Face Generation2025-07-16Attributes Shape the Embedding Space of Face Recognition Models2025-07-15COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation2025-07-15DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models2025-07-13Model Parallelism With Subnetwork Data Parallelism2025-07-11