TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Dataset Summarization by K Principal Concepts

Dataset Summarization by K Principal Concepts

Niv Cohen, Yedid Hoshen

2021-04-08Image ClusteringClustering
PaperPDF

Abstract

We propose the new task of K principal concept identification for dataset summarizarion. The objective is to find a set of K concepts that best explain the variation within the dataset. Concepts are high-level human interpretable terms such as "tiger", "kayaking" or "happy". The K concepts are selected from a (potentially long) input list of candidates, which we denote the concept-bank. The concept-bank may be taken from a generic dictionary or constructed by task-specific prior knowledge. An image-language embedding method (e.g. CLIP) is used to map the images and the concept-bank into a shared feature space. To select the K concepts that best explain the data, we formulate our problem as a K-uncapacitated facility location problem. An efficient optimization technique is used to scale the local search algorithm to very large concept-banks. The output of our method is a set of K principal concepts that summarize the dataset. Our approach provides a more explicit summary in comparison to selecting K representative images, which are often ambiguous. As a further application of our method, the K principal concepts can be used to classify the dataset into K groups. Extensive experiments demonstrate the efficacy of our approach.

Results

TaskDatasetMetricValueModel
Image ClusteringImageNet-100 (TEMI Split)ACCURACY0.731Single-Noun Prior
Image ClusteringImageNet-100 (TEMI Split)ARI0.628Single-Noun Prior
Image ClusteringImageNet-100 (TEMI Split)NMI0.805Single-Noun Prior
Image ClusteringCIFAR-10ARI0.702Single-Noun Prior
Image ClusteringCIFAR-10Accuracy0.853Single-Noun Prior
Image ClusteringCIFAR-10NMI0.731Single-Noun Prior
Image ClusteringImageNet-200 ACCURACY0.598Single-Noun Prior
Image ClusteringImageNet-200ARI0.486Single-Noun Prior
Image ClusteringImageNet-200NMI0.749Single-Noun Prior
Image ClusteringImageNet-50 (TEMI Split)ACCURACY0.827Single-Noun Prior
Image ClusteringImageNet-50 (TEMI Split)ARI0.744Single-Noun Prior
Image ClusteringImageNet-50 (TEMI Split)NMI0.847Single-Noun Prior

Related Papers

Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Ranking Vectors Clustering: Theory and Applications2025-07-16Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning2025-07-09Consistency and Inconsistency in $K$-Means Clustering2025-07-08MC-INR: Efficient Encoding of Multivariate Scientific Simulation Data using Meta-Learning and Clustered Implicit Neural Representations2025-07-03Supercm: Revisiting Clustering for Semi-Supervised Learning2025-06-30Temporal Rate Reduction Clustering for Human Motion Segmentation2025-06-26