TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Vision Models Are More Robust And Fair When Pretrained On ...

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski

2022-02-16FairnessSelf-Supervised Image ClassificationImage ClassificationAction ClassificationSelf-Supervised LearningTraffic Sign RecognitionDomain GeneralizationCopy DetectionWord EmbeddingsAction RecognitionMeme ClassificationOut-of-Distribution GeneralizationFine-Grained Image ClassificationSemi-Supervised Image Classification
PaperPDFCode(official)

Abstract

Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images. Applied to ImageNet, this leads to object centric features that perform on par with supervised features on most object-centric downstream tasks. In this work, we question if using this ability, we can learn any salient and more representative information present in diverse unbounded set of images from across the globe. To do so, we train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn. We scale our model size to dense 10 billion parameters to avoid underfitting on a large data size. We extensively study and validate our model performance on over 50 benchmarks including fairness, robustness to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets. The resulting model, not only captures well semantic information, it also captures information about artistic style and learns salient information such as geolocations and multilingual word embeddings based on visual content only. More importantly, we discover that such model is more robust, more fair, less harmful and less biased than supervised models or models trained on object centric datasets such as ImageNet.

Results

TaskDatasetMetricValueModel
Domain AdaptationImageNet-RTop-1 Error Rate43.9SEER (RegNet10B)
Domain AdaptationImageNet-ATop-1 accuracy %52.7SEER (RegNet10B)
Domain AdaptationImageNet-SketchTop-1 accuracy45.6SEER (RegNet10B)
VideoKinetics-700Top-1 Accuracy51.9SEER (RegNet10B)
Image ClassificationKITTI-DistTop 1 Accuracy78.34SEER (RegNet10B)
Image ClassificationPlaces205Top 1 Accuracy69SEER (RegNet10B - finetuned - 384px)
Image ClassificationImageNet V2Top 1 Accuracy76.2SEER (RegNet10B)
Image ClassificationDTDAccuracy80.5SEER (RegNet10B - linear eval)
Image ClassificationCLEVR/CountTop 1 Accuracy89.28SEER (RegNet10B)
Image ClassificationCLEVR/CountTop 1 Accuracy87.98SEER (RegNetY-128GF)
Image ClassificationObjectNetTop-1 Accuracy60.2SEER (RegNet10B)
Image ClassificationRESISC45Top 1 Accuracy95.61SEER (RegNet10B)
Image ClassificationRESISC45Top 1 Accuracy94.73SwAV (ResNet50-w5)
Image ClassificationRESISC45Top 1 Accuracy93.97DINO (DeiT-B/16)
Image ClassificationRESISC45Top 1 Accuracy93.35MoCo-v3 (ViT-B/16)
Image ClassificationRESISC45Top 1 Accuracy92.7CLIP (ViT-B/16)
Image ClassificationRESISC45Top 1 Accuracy92.48DeiT-B/16
Image ClassificationRESISC45Top 1 Accuracy89.77SimCLR-v2 (ResNet152-w3 + SK)
Image ClassificationRESISC45Top 1 Accuracy88.56ResNet50 (ImageNet-supervised)
Image ClassificationRESISC45Top 1 Accuracy85.4MoCo-v2 (ResNet50)
Image ClassificationCIFAR-10Percentage correct90SEER (RegNet10B)
Image ClassificationFlowers-102Accuracy96.3SEER (RegNet10B)
Image ClassificationCIFAR-100Percentage correct81.53SEER (RegNet10B)
Image ClassificationMNISTAccuracy99.42SEER (RegNet10B)
Image ClassificationMNISTPercentage error0.58SEER (RegNet10B)
Image ClassificationCLEVR/DistTop 1 Accuracy74.98SEER (RegNet10B)
Image ClassificationCLEVR/DistTop 1 Accuracy72.67SEER (RegNetY-128GF)
Image ClassificationSTL-10Percentage correct97.3SEER (RegNet10B)
Image ClassificationFood-101Accuracy (%)90.3SEER (RegNet10B - linear eval)
Image ClassificationEuroSATAccuracy (%)97.5SEER (RegNet10B - linear eval)
Image ClassificationSVHNPercentage error13.6SEER (RegNet10B)
Image ClassificationCaltech-101Accuracy91SEER (RegNet10B - linear eval)
Image ClassificationSUN397Accuracy80SEER (RegNet10B - linear eval)
Fine-Grained Image ClassificationCaltech-101Accuracy91SEER (RegNet10B - linear eval)
Fine-Grained Image ClassificationSUN397Accuracy80SEER (RegNet10B - linear eval)
Meme ClassificationHateful MemesROC-AUC0.734SEER (RegNet10B)
Domain GeneralizationImageNet-RTop-1 Error Rate43.9SEER (RegNet10B)
Domain GeneralizationImageNet-ATop-1 accuracy %52.7SEER (RegNet10B)
Domain GeneralizationImageNet-SketchTop-1 accuracy45.6SEER (RegNet10B)

Related Papers

A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18Automatic Classification and Segmentation of Tunnel Cracks Based on Deep Learning and Visual Explanations2025-07-18FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy2025-07-17Federated Learning for Commercial Image Sources2025-07-17MUPAX: Multidimensional Problem Agnostic eXplainable AI2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17