CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

Zhaoheng Zheng, Haidong Zhu, Ram Nevatia

2023-05-26Attribute Zero-Shot Learning Compositional Zero-Shot Learning

Abstract

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. Recent researchers focus on applying large-scale Vision-Language Pre-trained (VLP) models like CLIP with strong generalization ability. However, these methods treat the pre-trained model as a black box and focus on pre- and post-CLIP operations, which do not inherently mine the semantic concept between the layers inside CLIP. We propose to dive deep into the architecture and insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted. We assess our method on four popular CZSL datasets, MIT-States, C-GQA, UT-Zappos, and VAW-CZSL, which shows state-of-the-art performance compared to existing methods on all of them.

Results

Task	Dataset	Metric	Value	Model
Zero-Shot Learning	MIT-States, generalized split	H-Mean	39.9	CAILA
Zero-Shot Learning	MIT-States, generalized split	Seen accuracy	51	CAILA
Zero-Shot Learning	MIT-States, generalized split	Test AUC top 1	23.4	CAILA
Zero-Shot Learning	MIT-States, generalized split	Unseen accuracy	53.9	CAILA

Related Papers

GLAD: Generalizable Tuning for Vision-Language Models2025-07-17 MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16 Non-Adaptive Adversarial Face Generation2025-07-16 Attributes Shape the Embedding Space of Face Recognition Models2025-07-15 COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation2025-07-15 DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-supervised Panoptic Segmentation2025-07-14 Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models2025-07-13 Model Parallelism With Subnetwork Data Parallelism2025-07-11