TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Let Go of Your Labels with Unsupervised Transfer

Let Go of Your Labels with Unsupervised Transfer

Artyom Gadetsky, Yulun Jiang, Maria Brbic

2024-06-11International Conference on Machine Learning 2024 6Image ClusteringUnsupervised Image Classification
PaperPDFCode(official)

Abstract

Foundation vision-language models have enabled remarkable zero-shot transferability of the pre-trained representations to a wide range of downstream tasks. However, to solve a new task, zero-shot transfer still necessitates human guidance to define visual categories that appear in the data. Here, we show that fully unsupervised transfer emerges when searching for the labeling of a dataset that induces maximal margin classifiers in representation spaces of different foundation models. We present TURTLE, a fully unsupervised method that effectively employs this guiding principle to uncover the underlying labeling of a downstream dataset without any supervision and task-specific representation learning. We evaluate TURTLE on a diverse benchmark suite of 26 datasets and show that it achieves new state-of-the-art unsupervised performance. Furthermore, TURTLE, although being fully unsupervised, outperforms zero-shot transfer baselines on a wide range of datasets. In particular, TURTLE matches the average performance of CLIP zero-shot on 26 datasets by employing the same representation space, spanning a wide range of architectures and model sizes. By guiding the search for the underlying labeling using the representation spaces of two foundation models, TURTLE surpasses zero-shot transfer and unsupervised prompt tuning baselines, demonstrating the surprising power and effectiveness of unsupervised transfer.

Results

TaskDatasetMetricValueModel
Image ClusteringStanford CarsAccuracy0.646TURTLE (CLIP + DINOv2)
Image ClusteringKinetics-700Accuracy43TURTLE (CLIP + DINOv2)
Image ClusteringPCamAccuracy52TURTLE (CLIP + DINOv2)
Image ClusteringDTDAccuracy57.3TURTLE (CLIP + DINOv2)
Image ClusteringGTSRBAccuracy48.4TURTLE (CLIP + DINOv2)
Image ClusteringSUN397Accuracy67.9TURTLE (CLIP + DINOv2)
Image ClusteringEuroSATAccuracy96.6TURTLE (CLIP + DINOv2)
Image ClusteringCIFAR-10ARI0.989TURTLE (CLIP + DINOv2)
Image ClusteringCIFAR-10Accuracy0.995TURTLE (CLIP + DINOv2)
Image ClusteringCIFAR-10NMI0.985TURTLE (CLIP + DINOv2)
Image ClusteringCaltech-101Accuracy89.8TURTLE (CLIP + DINOv2)
Image ClusteringCLEVR CountsAccuracy24TURTLE (CLIP + DINOv2)
Image ClusteringHateful MemesAccuracy54.2TURTLE (CLIP + DINOv2)
Image ClusteringKITTIAccuracy39.4TURTLE (CLIP + DINOv2)
Image ClusteringCIFAR-100ARI0.834TURTLE (CLIP + DINOv2)
Image ClusteringCIFAR-100Accuracy0.898TURTLE (CLIP + DINOv2)
Image ClusteringCIFAR-100NMI0.915TURTLE (CLIP + DINOv2)
Image ClusteringUCF101Accuracy82.3TURTLE (CLIP + DINOv2)
Image ClusteringFGVC AircraftAccuracy36.5TURTLE (CLIP + DINOv2)
Image ClusteringMNISTAccuracy97.8TURTLE (CLIP + DINOv2)
Image ClusteringFlowers-102Accuracy99.6TURTLE (CLIP + DINOv2)
Image ClusteringBirdsnapAccuracy68.1TURTLE (CLIP + DINOv2)
Image ClusteringSTL-10ARI0.994TURTLE (CLIP + DINOv2)
Image ClusteringSTL-10Accuracy0.997TURTLE (CLIP + DINOv2)
Image ClusteringSTL-10NMI0.993TURTLE (CLIP + DINOv2)
Image ClusteringOxford-IIIT PetsAccuracy92.3TURTLE (CLIP + DINOv2)
Image ClusteringImageNetARI62.5TURTLE (CLIP + DINOv2)
Image ClusteringImageNetAccuracy72.9TURTLE (CLIP + DINOv2)
Image ClusteringImageNetNMI88.2TURTLE (CLIP + DINOv2)
Image ClusteringCountry211Accuracy11.1TURTLE (CLIP + DINOv2)
Image ClusteringRendered SST2Accuracy51.6TURTLE (CLIP + DINOv2)
Image ClusteringFood-101Accuracy92.2TURTLE (CLIP + DINOv2)
Image ClusteringFER2013Accuracy36.2TURTLE (CLIP + DINOv2)
Image ClusteringRESISC45Accuracy89.6TURTLE (CLIP + DINOv2)
Image ClassificationSTL-10Accuracy99.7TURTLE (CLIP + DINOv2)
Image ClassificationCIFAR-10Accuracy99.5TURTLE (CLIP + DINOv2)
Image ClassificationMNISTAccuracy97.8TURTLE (CLIP + DINOv2)
Image ClassificationImageNetARI62.5TURTLE (CLIP + DINOv2)
Image ClassificationImageNetAccuracy (%)72.9TURTLE (CLIP + DINOv2)

Related Papers

Unsupervised Deep Clustering of MNIST with Triplet-Enhanced Convolutional Autoencoders2025-06-11Structural-Spectral Graph Convolution with Evidential Edge Learning for Hyperspectral Image Clustering2025-06-11Advanced Clustering Framework for Semiconductor Image Analytics Integrating Deep TDA with Self-Supervised and Transfer Learning Techniques2025-05-05Utilization of Neighbor Information for Image Classification with Different Levels of Supervision2025-03-18Online Meta-learning for AutoML in Real-time (OnMAR)2025-02-27Keep It Light! Simplifying Image Clustering Via Text-Free Adapters2025-02-06Deep Clustering via Probabilistic Ratio-Cut Optimization2025-02-01Graph Cut-guided Maximal Coding Rate Reduction for Learning Image Embedding and Clustering2024-12-25