TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Chest wall lung sound dataset

Annotated audio files (separate combined annotation file) of lung sounds as recorded from various vantage points of the chest wall. The annotation includes the sound type (Insipratory: I, Experiatory: E, Wheezes: W, Crackles: C , N:Normal), the diagnosis as decided by a specialist (Asthma, COPD, BRON, heart failure, lung fibrosis, etc.), and the location on the chest wall from which the recording was taken (Posterior: P Lower: L Left: L Right R, UPPER: U, ANTERIOR: A, MIDDLE: M). The audio file names are coded: 1. Filter type; B: BELL 20-200Hz, Diaphragm 100-500 Hz, Extended range 50-500 Hz. 2. Patient number: P1-P112.

2 papers1 benchmarksAudio, Medical, Time series

DL3DV-10k

DL3DV-10K is a dataset of real-world videos with scene annotations and camera parameters.

2 papers0 benchmarksVideos

MFW+ (U-M)

MFW+ is a benchmark dataset for masked face recognition and an extended version of MFW. The original MFW, published as a benchmark for masked face recognition, is composed of 300 IDs and 3,000 images. However, with two duplicate IDs found in MFW, the dataset actually contains 298 unique IDs and 2,980 images. To evaluate models under various mask conditions and environments, we manually gathered additional data from the web. The refined and extended MFW, which we named MFW+, contains 606 IDs, 2,911 unmasked face images, and 2,838 masked face images. Paper: https://bmvc2022.mpi-inf.mpg.de/0723.pdf

2 papers6 benchmarks

WiFall (Wireless Sensing Dataset for Fall Detection, Action Recognition and People ID Identification with ESP32-S3)

WiFall dataset contains data related to fall detection, action recognition and people id identification in a meeting room scenario. The dataset provides synchronised CSI, RSSI, and timestamp for each sample.

2 papers1 benchmarksTime series

ImageNet-100 (GCD split)

This ImageNet-100 dataset was introduced in the following paper,

2 papers0 benchmarks

HUST-LEBW

An eyeblink detection in the wild dataset.

2 papers1 benchmarks

MUSIC-AVQA-R

We introduce the first dataset, MUSIC-AVQA-R, to evaluate the robustness of AVQA models. The construction of this dataset involves two key processes: rephrasing and splitting. The former involves the rephrasing of questions in the test split of MUSIC-AVQA, and the latter is dedicated to the categorization of questions into frequent (head) and rare (tail) subset.

2 papers0 benchmarks

M2QA (Multi-domain Multilingual Question Answering)

M2QA (Multi-domain Multilingual Question Answering) is an extractive question answering benchmark for evaluating joint language and domain transfer. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing. 40% of the data are unanswerable questions, 60% are answerable.

2 papers0 benchmarksTexts

ThermoHands

ThermoHands is the first benchmark dataset specifically designed for egocentric 3D hand pose estimation from thermal images. It addresses the challenges of hand pose estimation in low-light conditions and when the hand is occluded by gloves or other wearables—scenarios where traditional RGB or NIR-based systems struggle.

2 papers0 benchmarks3D, Images, Videos

Electron Microscopy Dataset

The dataset available for download on this webpage represents a 5x5x5µm section taken from the CA1 hippocampus region of the brain, corresponding to a 1065x2048x1536 volume. The resolution of each voxel is approximately 5x5x5nm. The data is provided as multipage TIF files that can be loaded in Fiji. We annotated mitochondria in two sub-volumes. Each sub-volume consists of the first 165 slices of the 1065x2048x1536 image stack. The volume used for training our algorithm in the publications mentionned at the bottom of this page is the top part while the bottom part was used for testing.

2 papers4 benchmarksImages

SF-XL Night

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers3 benchmarks

SF-XL Occlusion

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers3 benchmarks

SVOX

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers3 benchmarks

LoTE-Animal (LoTE-Animal: A Long Time-span Dataset for Endangered Animal Behavior Understanding)

Understanding and analyzing animal behavior is increasingly essential to protect endangered animal species. However, the application of advanced computer vision techniques in this regard is minimal, which boils down to lacking large and diverse datasets for training deep models.

2 papers2 benchmarksImages, RGB Video, Videos

ApisTox

ApisTox contains molecules in SMILES format for predicting pesticides toxicity to honey bees.

2 papers0 benchmarksGraphs

Kaleidoscope (Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation)

The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam benchmark to date for the multilingual evaluation of vision-language models. Kaleidoscope is a large-scale, in-language multimodal benchmark designed to evaluate VLMs across diverse languages and visual inputs. Kaleidoscope covers 18 languages and 14 different subjects, amounting to a total of 20,911 multiple-choice questions. Built through an open science collaboration with a diverse group of researchers worldwide, Kaleidoscope ensures linguistic and cultural authenticity. We evaluate top-performing multilingual vision-language models and find that they perform poorly on low-resource languages

2 papers0 benchmarksImages, Texts

CLEAR-Bias (Corpus for Linguistic Evaluation of Adversarial Robustness against Bias)

CLEAR-Bias is a benchmark dataset designed to evaluate the robustness of large language models (LLMs) against bias elicitation, particularly under adversarial conditions. It comprises 4,400 prompts across two task formats: multiple-choice and sentence completion. These prompts span seven core bias categories—age, disability, ethnicity, gender, religion, sexual orientation, and socioeconomic status—as well as three intersectional categories, enabling the exploration of overlapping social biases often overlooked in standard evaluations. Each category includes 20 carefully crafted base prompts (10 per task type), which are further expanded using seven jailbreak techniques: machine translation, obfuscation, prefix and prompt injection, refusal suppression, reward incentives, and role-playing—each implemented with three variants.

2 papers0 benchmarksTexts

RuSentNE (RuSentNE-2023)

https://github.com/dialogue-evaluation/RuSentNE-evaluation

2 papers0 benchmarksTexts

RGBT-Scenes

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

DiffVox

435 vocal presets retrieval from MedleyDB and a private collection of multi-track mixes.

2 papers0 benchmarks
PreviousPage 359 of 1000Next