TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Beyond Sole Strength: Customized Ensembles for Generalized...

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

Zhihe Lu, Jiawang Bai, Xin Li, Zeyu Xiao, Xinchao Wang

2023-11-28Prompt Engineering
PaperPDFCode(official)

Abstract

Fine-tuning pre-trained vision-language models (VLMs), e.g., CLIP, for the open-world generalization has gained increasing popularity due to its practical value. However, performance advancements are limited when relying solely on intricate algorithmic designs for a single model, even one exhibiting strong performance, e.g., CLIP-ViT-B/16. This paper, for the first time, explores the collaborative potential of leveraging much weaker VLMs to enhance the generalization of a robust single model. The affirmative findings motivate us to address the generalization problem from a novel perspective, i.e., ensemble of pre-trained VLMs. We introduce three customized ensemble strategies, each tailored to one specific scenario. Firstly, we introduce the zero-shot ensemble, automatically adjusting the logits of different models based on their confidence when only pre-trained VLMs are available. Furthermore, for scenarios with extra few-shot samples, we propose the training-free and tuning ensemble, offering flexibility based on the availability of computing resources. The proposed ensemble strategies are evaluated on zero-shot, base-to-new, and cross-dataset generalization, achieving new state-of-the-art performance. Notably, this work represents an initial stride toward enhancing the generalization performance of VLMs via ensemble. The code is available at https://github.com/zhiheLu/Ensemble_VLM.git.

Results

TaskDatasetMetricValueModel
Prompt EngineeringImageNetHarmonic mean75.49Customized Ensemble

Related Papers

Leveraging Language Prior for Infrared Small Target Detection2025-07-17Emotional Support with LLM-based Empathetic Dialogue Generation2025-07-17Prompt Engineering in Segment Anything Model: Methodologies, Applications, and Emerging Challenges2025-07-13AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework2025-07-08Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach2025-07-04State and Memory is All You Need for Robust and Reliable AI Agents2025-06-30Prompt Mechanisms in Medical Imaging: A Comprehensive Survey2025-06-28Fine-Tuning and Prompt Engineering of LLMs, for the Creation of Multi-Agent AI for Addressing Sustainable Protein Production Challenges2025-06-25