TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DABS: A Domain-Agnostic Benchmark for Self-Supervised Lear...

DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah Goodman

2021-11-23Self-Supervised Learning
PaperPDFCode(official)

Abstract

Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that significant progress is needed before self-supervised learning is an out-of-the-box solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.

Results

TaskDatasetMetricValueModel
Self-Supervised LearningDABSImages & Text57.5Pretraining: None
Self-Supervised LearningDABSMed. Imaging68.1Pretraining: None
Self-Supervised LearningDABSNatural Images10.1Pretraining: None
Self-Supervised LearningDABSSensors69.8Pretraining: None
Self-Supervised LearningDABSSpeech24.9Pretraining: None
Self-Supervised LearningDABSText42.3Pretraining: None
Self-Supervised LearningDABSImages & Text54.3Pretraining: ShED
Self-Supervised LearningDABSMed. Imaging74.5Pretraining: ShED
Self-Supervised LearningDABSNatural Images20.9Pretraining: ShED
Self-Supervised LearningDABSSensors88.7Pretraining: ShED
Self-Supervised LearningDABSSpeech36.5Pretraining: ShED
Self-Supervised LearningDABSText48.4Pretraining: ShED
Self-Supervised LearningDABSImages & Text48.9Pretraining: e-Mix
Self-Supervised LearningDABSMed. Imaging72.4Pretraining: e-Mix
Self-Supervised LearningDABSNatural Images27.9Pretraining: e-Mix
Self-Supervised LearningDABSSensors79.5Pretraining: e-Mix
Self-Supervised LearningDABSSpeech41.8Pretraining: e-Mix
Self-Supervised LearningDABSText44.1Pretraining: e-Mix

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model2025-07-01ShapeEmbed: a self-supervised learning framework for 2D contour quantification2025-07-01RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models2025-06-27Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features2025-06-26Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings2025-06-26