TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Panoptic (CMU Panoptic Studio)

CMU Panoptic is a large scale dataset providing 3D pose annotations (1.5 millions) for multiple people engaging social activities. It contains 65 videos (5.5 hours) with multi-view annotations, but only 17 of them are in multi-person scenario and have the camera parameters.

131 papers10 benchmarksImages

BANKING77

Dataset composed of online banking queries annotated with their corresponding intents.

131 papers9 benchmarksTexts

AlpacaEval

The AlpacaEval set contains 805 instructions form self-instruct, open-assistant, vicuna, koala, hh-rlhf. Those were selected so that the AlpacaEval ranking of models on the AlpacaEval set would be similar to the ranking on the Alpaca demo data.

131 papers1 benchmarksTexts

GoEmotions

GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.

130 papers0 benchmarksTexts

LIAR

LIAR is a publicly available dataset for fake news detection. A decade-long of 12.8K manually labeled short statements were collected in various contexts from POLITIFACT.COM, which provides detailed analysis report and links to source documents for each case. This dataset can be used for fact-checking research as well. Notably, this new dataset is an order of magnitude larger than previously largest public fake news datasets of similar type. The LIAR dataset4 includes 12.8K human labeled short statements from POLITIFACT.COM’s API, and each statement is evaluated by a POLITIFACT.COM editor for its truthfulness.

130 papers2 benchmarksTexts

LLaVA-Bench

LLaVA-Bench is a dataset created to evaluate the capability of large multimodal models (LMM) in more challenging tasks and generalizability to novel domains. It consists of a diverse set of 24 images with 60 questions in total, including indoor and outdoor scenes, memes, paintings, sketches, etc., and each image with a highly-detailed and manually-curated description and a proper selection of questions. The dataset is part of the LLaVA project, which aims to develop multimodal chatbots that follow human intents to complete various daily-life visual tasks in the wild.

130 papers1 benchmarks

MSL (Mars Science Laboratory)

This dataset contains expert-labeled telemetry anomaly data from the Mars Science Laboratory (MSL) rover, Curiosity.

130 papers8 benchmarks

VOT2018

VOT2018 is a dataset for visual object tracking. It consists of 60 challenging videos collected from real-life datasets.

129 papers4 benchmarksImages, Videos

CityPersons

The CityPersons dataset is a subset of Cityscapes which only consists of person annotations. There are 2975 images for training, 500 and 1575 images for validation and testing. The average of the number of pedestrians in an image is 7. The visible-region and full-body annotations are provided.

129 papers21 benchmarksImages

TabFact

TabFact is a large-scale dataset which consists of 117,854 manually annotated statements with regard to 16,573 Wikipedia tables, their relations are classified as ENTAILED and REFUTED. TabFact is the first dataset to evaluate language inference on structured data, which involves mixed reasoning skills in both symbolic and linguistic aspects.

129 papers3 benchmarksTexts

ORL (Our Database of Faces)

The ORL Database of Faces contains 400 images from 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The size of each image is 92x112 pixels, with 256 grey levels per pixel.

129 papers1 benchmarksImages

Make3D

The Make3D dataset is a monocular Depth Estimation dataset that contains 400 single training RGB and depth map pairs, and 134 test samples. The RGB images have high resolution, while the depth maps are provided at low resolution.

129 papers6 benchmarksImages

FMA (Free Music Archive)

The Free Music Archive (FMA) is a large-scale dataset for evaluating several tasks in Music Information Retrieval. It consists of 343 days of audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. It provides full-length and high-quality audio, pre-computed features, together with track- and user-level metadata, tags, and free-form text such as biographies.

128 papers2 benchmarksAudio

LFPW (Labeled Face Parts in the Wild)

The Labeled Face Parts in-the-Wild (LFPW) consists of 1,432 faces from images downloaded from the web using simple text queries on sites such as google.com, flickr.com, and yahoo.com. Each image was labeled by three MTurk workers, and 29 fiducial points, shown below, are included in dataset.

128 papers0 benchmarksImages

Meta-Dataset

The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

128 papers2 benchmarksImages

Europarl (European Parliament Proceedings Parallel Corpus)

A corpus of parallel text in 21 European languages from the proceedings of the European Parliament.

128 papers2 benchmarksTexts

NELL-995

NELL-995 KG Completion Dataset

127 papers41 benchmarks

WikiHow

WikiHow is a dataset of more than 230,000 article and summary pairs extracted and constructed from an online knowledge base written by different human authors. The articles span a wide range of topics and represent high diversity styles.

127 papers8 benchmarksTexts

LogiQA

LogiQA consists of 8,678 QA instances, covering multiple types of deductive reasoning. Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.

127 papers0 benchmarksTexts

WNUT 2017 (WNUT 2017 Emerging and Rare entity recognition)

This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarisation), but recall on them is a real problem in noisy text - even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?” - even human experts find entity kktny hard to detect and resolve. This task will evaluate the ability to detect and classify novel, emerging, singleton named entities in noisy text.

127 papers5 benchmarksTexts
PreviousPage 27 of 1000Next