TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Quan Sun, Jinsheng Wang, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang

2024-02-06Image ClassificationZero-Shot Transfer Image Classification
PaperPDFCode(official)Code(official)

Abstract

Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models. We present EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters. With only 6-billion training samples seen, EVA-CLIP-18B achieves an exceptional 80.7% zero-shot top-1 accuracy averaged across 27 widely recognized image classification benchmarks, outperforming its forerunner EVA-CLIP (5-billion parameters) and other open-source CLIP models by a large margin. Remarkably, we observe a consistent performance improvement with the model size scaling of EVA-CLIP, despite maintaining a constant training dataset of 2-billion image-text pairs from LAION-2B and COYO-700M. This dataset is openly available and much smaller than the in-house datasets (e.g., DFN-5B, WebLI-10B) employed in other state-of-the-art CLIP models. EVA-CLIP-18B demonstrates the potential of EVA-style weak-to-strong visual model scaling. With our model weights made publicly available, we hope to facilitate future research in vision and multimodal foundation models.

Results

TaskDatasetMetricValueModel
Zero-Shot Transfer Image ClassificationImageNet V2Accuracy (Private)77.9EVA-CLIP-18B
Zero-Shot Transfer Image ClassificationImageNet-AAccuracy (Private)87.3EVA-CLIP-18B
Zero-Shot Transfer Image ClassificationImageNetAccuracy (Private)83.8EVA-CLIP-18B
Zero-Shot Transfer Image ClassificationImageNet-RAccuracy95.7EVA-CLIP-18B
Zero-Shot Transfer Image ClassificationSUNAccuracy77.7EVA-CLIP-18B
Zero-Shot Transfer Image ClassificationFood-101Top 1 Accuracy95.8EVA-CLIP-18B
Zero-Shot Transfer Image ClassificationObjectNetAccuracy (Private)82.2EVA-CLIP-18B
Zero-Shot Transfer Image ClassificationImageNet-SketchAccuracy (Private)74.7EVA-CLIP-18B

Related Papers

Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15Transferring Styles for Reduced Texture Bias and Improved Robustness in Semantic Segmentation Networks2025-07-14FedGSCA: Medical Federated Learning with Global Sample Selector and Client Adaptive Adjuster under Label Noise2025-07-13