Transductive Zero-Shot and Few-Shot CLIP

Ségolène Martin, Yunshi Huang, Fereshteh Shakeri, Jean-Christophe Pesquet, Ismail Ben Ayed

2024-01-01CVPR 2024 1Image Classification Few-Shot Image Classification Classification

Abstract

Transductive inference has been widely investigated in few-shot image classification but completely overlooked in the recent fast growing literature on adapting vision-langage models like CLIP. This paper addresses the transductive zero-shot and few-shot CLIP classification challenge in which inference is performed jointly across a mini-batch of unlabeled query samples rather than treating each instance independently. This paper addresses the transductive zero-shot and few-shot CLIP classification challenge in which inference is performed jointly across a mini-batch of unlabeled query samples rather than treating each instance independently. We initially construct informative vision-text probability features leading to a classification problem on the unit simplex set. Inspired by Expectation-Maximization (EM) our optimization-based classifying objective models the data probability distribution for each class using a Dirichlet law. The minimization problem is then tackled with a novel block Majorization-Minimization algorithm which simultaneously estimates the distribution parameters and class assignments. Extensivenumerical experiments on 11 datasets underscore the benefits and efficacy of our batch inference approach. On zero-shot tasks with test batches of 75 samples our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance. Additionally we outperform state-of-the-art methods in the few-shot setting. Code is available at https://github.com/SegoleneMartin/transductive-CLIP.

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Safeguarding Federated Learning-based Road Condition Classification2025-07-16 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15