TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Snips-SmartLights

The SmartLights benchmark from Snipstests the capability of controlling lights in different rooms. It consists of 1660 requests which are split into five partitions for a 5-fold evaluation. A sample command could be: “please change the [bedroom] lights to [red]” or “i’d like the [living room] lights to be at [twelve] percent”

8 papers3 benchmarksAudio

Def_Armored_parallel (SMAC+_Def_Armored_parallel_20)

smac+ defense armored scenario with parallel episodic buffer

8 papers2 benchmarks

Def_Outnumbered_parallel (SMAC+_Def_Outnumbered_parallel_20)

smac+ defense outnumbered scenario with parallel episodic buffer

8 papers2 benchmarks

Off_Hard_parallel (SMAC+_Off_Hard_parallel_20)

smac+ offensive hard scenario with 20 parallel episodic buffer.

8 papers2 benchmarks

Off_Superhard_parallel (SMAC+_Off_Superhard_parallel_20)

smac+ offensive scenario with 20 parallel episodic buffer.

8 papers2 benchmarks

WorldStrat (The WorldStrat Dataset: Open High-Resolution Satellite Imagery With Paired Multi-Temporal Low-Resolution)

Nearly 10,000 km² of free high-resolution and paired multi-temporal low-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. ​

8 papers0 benchmarksImages, Time series

Figment

A dataset for fine-grained entity typing of knowledge graph entities built from Freebase. It can be used to evaluate entity representations and also mention-level entity typing.

8 papers0 benchmarks

ZJU-MoCap (ZJU-MoCap Dataset)

LightStage is a multi-view dataset, which is proposed in NeuralBody. This dataset captures multiple dynamic human videos using a multi-camera system that has 20+ synchronized cameras. The humans perform complex motions, including twirling, Taichi, arm swings, warmup, punching, and kicking. We provide the SMPL-X parameters recovered with EasyMocap, which contain the motions of body, hand, and face.

8 papers0 benchmarks

HANNA (HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.)

HANNA, a large annotated dataset of Human-ANnotated NArratives for Automatic Story Generation (ASG) evaluation, has been designed for the benchmarking of automatic metrics for ASG. HANNA contains 1,056 stories generated from 96 prompts from the WritingPrompts dataset. Each prompt is linked to a human story and to 10 stories generated by different ASG systems. Each story was annotated on six human criteria (Relevance, Coherence, Empathy, Surprise, Engagement and Complexity) by three raters. HANNA also contains the scores produced by 72 automatic metrics on each story.

8 papers0 benchmarksTabular

REFUGE2

The goal of REFUGE2 challenge is to evaluate and compare automated algorithms for glaucoma detection and optic disc/cup segmentation on a standard dataset of retinal fundus images. We invite the medical image analysis community to participate by developing and testing existing and novel automated classification and segmentation methods.

8 papers0 benchmarks

University of Waterloo skin cancer database

The dataset is maintained by VISION AND IMAGE PROCESSING LAB, University of Waterloo. The images of the dataset were extracted from the public databases DermIS and DermQuest, along with manual segmentations of the lesions.

8 papers4 benchmarksImages

NCT-CRC-HE-100K

The NCT-CRC-HE-100K dataset is a set of 100,000 non-overlapping image patches extracted from 86 H$\&$E stained human cancer tissue slides and normal tissue from the NCT biobank (National Center for Tumor Diseases) and the UMM pathology archive (University Medical Center Mannheim). While the dataset Colorectal Cacner-Validation-Histology-7K (CRC-VAL-HE-7K) consist of 7180 images extracted from 50 patients with colorectal adenocarcinoma and were used to create a dataset that does not overlap with patients in the NCT-CRC-HE-100K dataset. It was created by pathologists by manually delineating tissue regions in whole slide images into the following nine tissue classes: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).

8 papers9 benchmarksImages

CLEVR-Math

CLEVR-Math is a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. These word problems requires a combination of language, visual and mathematical reasoning.

8 papers0 benchmarksImages, Texts

Europarl-ASR

Europarl-ASR (EN) is a 1300-hour English-language speech and text corpus of parliamentary debates for (streaming) Automatic Speech Recognition training and benchmarking, speech data filtering and speech data verbatimization, based on European Parliament speeches and their official transcripts (1996-2020). Includes dev-test sets for streaming ASR benchmarking, made up of 18 hours of manually revised speeches. The availability of manual non-verbatim and verbatim transcripts for dev-test speeches makes this corpus also useful for the assessment of automatic filtering and verbatimization techniques. The corpus is released under an open licence at https://www.mllp.upv.es/europarl-asr/

8 papers0 benchmarksSpeech

Meta-Album (Multi-domain Meta-Dataset for Few-Shot Image Classification)

Meta Album is a meta-dataset created for few-shot learning, meta-learning, continual learning and so on. Meta Album consists of 40 datasets from 10 unique domains. Datasets are arranged in sets (10 datasets, one dataset from each domain). It is a continuously growing meta-dataset.

8 papers0 benchmarks

EUR-Lex-Sum

EUR-Lex-Sum is a dataset for cross-lingual summarization. It is based on manually curated document summaries of legal acts from the European Union law platform. Documents and their respective summaries exist as crosslingual paragraph-aligned data in several of the 24 official European languages, enabling access to various cross-lingual and lower-resourced summarization setups. The dataset contains up to 1,500 document/summary pairs per language, including a subset of 375 cross-lingually aligned legal acts with texts available in all 24 languages.

8 papers0 benchmarksTexts

Phee

Phee is a dataset for pharmacovigilance comprising over 5000 annotated events from medical case reports and biomedical literature. It is designed for biomedical event extraction tasks.

8 papers0 benchmarksBiomedical, Medical

ATLAS v2.0 (Anatomical Tracings of Lesions After Stroke Dataset version 2.0)

Accurate lesion segmentation is critical in stroke rehabilitation research for the quantification of lesion burden and accurate image processing. Current automated lesion segmentation methods for T1-weighted (T1w) MRIs, commonly used in rehabilitation research, lack accuracy and reliability. Manual segmentation remains the gold standard, but it is time-consuming, subjective, and requires significant neuroanatomical expertise. However, many methods developed with ATLAS v1.2 report low accuracy, are not publicly accessible or are improperly validated, limiting their utility to the field. Here we present ATLAS v2.0 (N=1271), a larger dataset of T1w stroke MRIs and manually segmented lesion masks that includes training (public. n=655), test (masks hidden, n=300), and generalizability (completely hidden, n=316) data. Algorithm development using this larger sample should lead to more robust solutions, and the hidden test and generalizability datasets allow for unbiased performance evaluation

8 papers1 benchmarks3D, Biomedical, MRI, Medical

H3WB (Human 3.6M 3D WholeBody)

Human3.6M 3D WholeBody (H3WB) is a large scale dataset with 133 whole-body keypoint annotations on 100K images, made possible by a new multi-view pipeline. It is designed for the three new tasks : i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, iii) 3D whole-body pose estimation from a single RGB image.

8 papers15 benchmarksImages

Cityscapes-DVPS

Cityscapes-DVPS is derived from Cityscapes-VPS by adding re-computed depth maps from Cityscapes dataset. Cityscapes-DVPS is distributed under Creative Commons Attribution-NonCommercial-ShareAlike license.

8 papers0 benchmarks
PreviousPage 175 of 1000Next