TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/EVA-CLIP: Improved Training Techniques for CLIP at Scale

EVA-CLIP: Improved Training Techniques for CLIP at Scale

Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, Yue Cao

2023-03-27Image ClassificationRepresentation LearningZero-Shot Action RecognitionZero-Shot Transfer Image Classification
PaperPDFCodeCode(official)CodeCode

Abstract

Contrastive language-image pre-training, CLIP for short, has gained increasing attention for its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models that significantly improve the efficiency and effectiveness of CLIP training. Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs. Notably, our largest 5.0B-parameter EVA-02-CLIP-E/14+ with only 9 billion seen samples achieves 82.0 zero-shot top-1 accuracy on ImageNet-1K val. A smaller EVA-02-CLIP-L/14+ with only 430 million parameters and 6 billion seen samples achieves 80.4 zero-shot top-1 accuracy on ImageNet-1K val. To facilitate open access and open research, we release the complete suite of EVA-CLIP to the community at https://github.com/baaivision/EVA/tree/master/EVA-CLIP.

Results

TaskDatasetMetricValueModel
Image ClassificationObjectNetTop-1 Accuracy79.6EVA-02-CLIP-E/14+
Zero-Shot Transfer Image ClassificationImageNet V2Accuracy (Private)75.7EVA-CLIP-E/14+
Zero-Shot Transfer Image ClassificationImageNet-AAccuracy (Private)82.1EVA-CLIP-E/14+
Zero-Shot Transfer Image ClassificationImageNetAccuracy (Private)82EVA-CLIP-E/14+
Zero-Shot Transfer Image ClassificationImageNet-RAccuracy94.5EVA-CLIP-E/14+
Zero-Shot Transfer Image ClassificationFood-101Top 1 Accuracy94.9EVA-CLIP-E/14+
Zero-Shot Transfer Image ClassificationObjectNetAccuracy (Private)79.6EVA-CLIP-E/14+
Zero-Shot Transfer Image ClassificationImageNet-SketchAccuracy (Private)71.6EVA-CLIP-E/14+
Zero-Shot Action RecognitionUCF101Top-1 Accuracy83.1EVA-CLIP-E/14+

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17