TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Generalized Category Discovery

Generalized Category Discovery

Sagar Vaze, Kai Han, Andrea Vedaldi, Andrew Zisserman

2022-01-07CVPR 2022 1Representation LearningFine-Grained Visual RecognitionOpen-World Semi-Supervised Learning
PaperPDFCode(official)

Abstract

In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. Here, the unlabelled images may come from labelled classes or from novel ones. Existing recognition methods are not able to deal with this setting, because they make several restrictive assumptions, such as the unlabelled instances only coming from known - or unknown - classes, and the number of unknown classes being known a-priori. We address the more unconstrained setting, naming it 'Generalized Category Discovery', and challenge all these assumptions. We first establish strong baselines by taking state-of-the-art algorithms from novel category discovery and adapting them for this task. Next, we propose the use of vision transformers with contrastive representation learning for this open-world setting. We then introduce a simple yet effective semi-supervised $k$-means method to cluster the unlabelled data into seen and unseen classes automatically, substantially outperforming the baselines. Finally, we also propose a new approach to estimate the number of classes in the unlabelled data. We thoroughly evaluate our approach on public datasets for generic object classification and on fine-grained datasets, leveraging the recent Semantic Shift Benchmark suite. Project page at https://www.robots.ox.ac.uk/~vgg/research/gcd

Results

TaskDatasetMetricValueModel
Image ClassificationImageNet-100 (TEMI Split)All accuracy (50% Labeled)74.1GCD (ViT-B-16)
Image ClassificationImageNet-100 (TEMI Split)Novel accuracy (50% Labeled)66.3GCD (ViT-B-16)
Image ClassificationImageNet-100 (TEMI Split)Seen accuracy (50% Labeled)89.8GCD (ViT-B-16)
Image ClassificationCIFAR-10All accuracy (50% Labeled)91.5GCD (ViT-B-16)
Image ClassificationCIFAR-10Novel accuracy (50% Labeled)88.2GCD (ViT-B-16)
Image ClassificationCIFAR-10Seen accuracy (50% Labeled)97.9GCD (ViT-B-16)
Semi-Supervised Image ClassificationImageNet-100 (TEMI Split)All accuracy (50% Labeled)74.1GCD (ViT-B-16)
Semi-Supervised Image ClassificationImageNet-100 (TEMI Split)Novel accuracy (50% Labeled)66.3GCD (ViT-B-16)
Semi-Supervised Image ClassificationImageNet-100 (TEMI Split)Seen accuracy (50% Labeled)89.8GCD (ViT-B-16)
Semi-Supervised Image ClassificationCIFAR-10All accuracy (50% Labeled)91.5GCD (ViT-B-16)
Semi-Supervised Image ClassificationCIFAR-10Novel accuracy (50% Labeled)88.2GCD (ViT-B-16)
Semi-Supervised Image ClassificationCIFAR-10Seen accuracy (50% Labeled)97.9GCD (ViT-B-16)

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction2025-07-15Dual Dimensions Geometric Representation Learning Based Document Dewarping2025-07-11