TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

DIHARD II

The DIHARD II development and evaluation sets draw from a diverse set of sources exhibiting wide variation in recording equipment, recording environment, ambient noise, number of speakers, and speaker demographics. The development set includes reference diarization and speech segmentation and may be used for any purpose including system development or training.

34 papers2 benchmarksSpeech

IBims-1 (Independent benchmark images and matched scans v1)

iBims-1 (independent Benchmark images and matched scans - version 1) is a new high-quality RGB-D dataset, especially designed for testing single-image depth estimation (SIDE) methods. A customized acquisition setup, composed of a digital single-lens reflex (DSLR) camera and a high-precision laser scanner was used to acquire high-resolution images and highly accurate depth maps of diverse indoors scenarios.

34 papers14 benchmarksImages

CSL

CSL is a synthetic dataset introduced in Murphy et al. (2019) to test the expressivity of GNNs. In particular, graphs are isomorphic if they have the same degree and the task is to classify non-isomorphic graphs.

34 papers3 benchmarksGraphs

GrailQA (Strongly Generalizable Question Answering)

GrailQA is a new large-scale, high-quality dataset for question answering on knowledge bases (KBQA) on Freebase with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It can be used to test three levels of generalization in KBQA: i.i.d., compositional, and zero-shot.

34 papers9 benchmarksTexts

CLEAR

CLEAR is a continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). CLEAR is built from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. The pipeline makes use of pretrained vision language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.

34 papers0 benchmarksImages

Clear Weather (DENSE)

We introduce an object detection dataset in challenging adverse weather conditions covering 12000 samples in real-world driving scenes and 1500 samples in controlled weather conditions within a fog chamber. The dataset includes different weather conditions like fog, snow, and rain and was acquired by over 10,000 km of driving in northern Europe. The driven route with cities along the road is shown on the right. In total, 100k Objekts were labeled with accurate 2D and 3D bounding boxes. The main contributions of this dataset are: - We provide a proving ground for a broad range of algorithms covering signal enhancement, domain adaptation, object detection, or multi-modal sensor fusion, focusing on the learning of robust redundancies between sensors, especially if they fail asymmetrically in different weather conditions. - The dataset was created with the initial intention to showcase methods, which learn of robust redundancies between the sensor and enable a raw data sensor fusion in cas

34 papers7 benchmarksLiDAR

xP3

xP3 is a multilingual dataset for multitask prompted finetuning. It is a composite of supervised datasets in 46 languages with English and machine-translated prompts.

34 papers0 benchmarksTexts

Re-DocRED (Revisiting Document Level Relation Extraction)

The Re-DocRED Dataset resolved the following problems of DocRED:

34 papers4 benchmarks

CRACK500

For the details of the work, the readers are refer to the paper "Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection" (FPHB), T-ITS 2019. You can find the paper in https://www.researchgate.net/publication/330244656_Feature_Pyramid_and_Hierarchical_Boosting_Network_for_Pavement_Crack_Detection or https://arxiv.org/abs/1901.06340.

34 papers0 benchmarksImages

minesweeper

minesweeper is a synthetic graph emulating the eponymous game.

34 papers1 benchmarksGraphs

ToxicChat

ToxicChat is a novel benchmark dataset constructed based on real user queries from an open-source chatbot. Unlike previous toxicity detection benchmarks that primarily rely on social media content, ToxicChat captures the rich and nuanced phenomena inherent in real-world user-AI interactions. This unique dataset reveals significant domain differences compared to social media contents, making it a valuable resource for exploring the challenges of toxicity detection in user-AI conversations¹.

34 papers0 benchmarks

Deblur-NeRF

This dataset focus on two blur types: camera motion blur and defocus blur. For each type of blur we synthesize $5$ scenes using Blender. We manually place multi-view cameras to mimic real data capture. To render images with camera motion blur, we randomly perturb the camera pose, and then linearly interpolate poses between the original and perturbed poses for each view. We render images from interpolated poses and blend them in linear RGB space to generate the final blurry images. For defocus blur, we use the built-in functionality to render depth-of-field images. We fix the aperture and randomly choose a focus plane between the nearest and furthest depth.

34 papers0 benchmarksImages

TUD-L

The TUD-L (TUD Light) dataset is part of the Benchmark for 6D Object Pose Estimation (BOP). Let me provide you with some details about this dataset:

34 papers0 benchmarks

CIHP (Crowd Instance-level Human Parsing)

The Crowd Instance-level Human Parsing (CIHP) dataset has 38,280 diverse human images. Each image in CIHP is labeled with pixel-wise annotations on 20 categories and instance-level identification. The dataset can be used for the human part segmentation task.

33 papers2 benchmarksImages

WMCA (Wide Multi Channel Presentation Attack)

The Wide Multi Channel Presentation Attack (WMCA) database consists of 1941 short video recordings of both bonafide and presentation attacks from 72 different identities. The data is recorded from several channels including color, depth, infra-red, and thermal.

33 papers2 benchmarksImages, RGB-D, Videos

JSB Chorales

The JSB chorales are a set of short, four-voice pieces of music well-noted for their stylistic homogeneity. The chorales were originally composed by Johann Sebastian Bach in the 18th century. He wrote them by first taking pre-existing melodies from contemporary Lutheran hymns and then harmonising them to create the parts for the remaining three voices. The version of the dataset used canonically in representation learning contexts consists of 382 such chorales, with a train/validation/test split of 229, 76 and 77 samples respectively.

33 papers2 benchmarksMidi, Music

Pinterest

The Pinterest dataset contains more than 1 million images associated to Pinterest users’ who have “pinned” them.

33 papers4 benchmarksGraphs

Completion3D

The Completion3D benchmark is a dataset for evaluating state-of-the-art 3D Object Point Cloud Completion methods. Ggiven a partial 3D object point cloud the goal is to infer a complete 3D point cloud for the object.

33 papers2 benchmarksPoint cloud

MTL-AQA

A new multitask action quality assessment (AQA) dataset, the largest to date, comprising of more than 1600 diving samples; contains detailed annotations for fine-grained action recognition, commentary generation, and estimating the AQA score. Videos from multiple angles provided wherever available.

33 papers12 benchmarksAudio, Texts, Videos

TallyQA

TallyQA is a large-scale dataset for open-ended counting.

33 papers0 benchmarks
PreviousPage 74 of 1000Next