TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

DIVA-HisDB

The database consists of 150 annotated pages of three different medieval manuscripts with challenging layouts. Furthermore, we provide a layout analysis ground-truth which has been iterated on, reviewed, and refined by an expert in medieval studies.

15 papers8 benchmarks

QED

QED is a linguistically principled framework for explanations in question answering. Given a question and a passage, QED represents an explanation of the answer as a combination of discrete, human-interpretable steps: sentence selection := identification of a sentence implying an answer to the question referential equality := identification of noun phrases in the question and the answer sentence that refer to the same thing predicate entailment := confirmation that the predicate in the sentence entails the predicate in the question once referential equalities are abstracted away. The QED dataset is an expert-annotated dataset of QED explanations build upon a subset of the Google Natural Questions dataset.

15 papers2 benchmarksImages

DUC 2004

The DUC2004 dataset is a dataset for document summarization. Is designed and used for testing only. It consists of 500 news articles, each paired with four human written summaries. Specifically it consists of 50 clusters of Text REtrieval Conference (TREC) documents, from the following collections: AP newswire, 1998-2000; New York Times newswire, 1998-2000; Xinhua News Agency (English version), 1996-2000. Each cluster contained on average 10 documents.

15 papers6 benchmarksTexts

Salinas (Salinas Scene)

Salinas Scene is a hyperspectral dataset collected by the 224-band AVIRIS sensor over Salinas Valley, California, and is characterized by high spatial resolution (3.7-meter pixels). The area covered comprises 512 lines by 217 samples. 20 water absorption bands were discarder: [108-112], [154-167], 224. This image was available only as at-sensor radiance data. It includes vegetables, bare soils, and vineyard fields. Salinas groundtruth contains 16 classes.

15 papers13 benchmarksHyperspectral images

BeerAdvocate

BeerAdvocate is a dataset that consists of beer reviews from beeradvocate. The data span a period of more than 10 years, including all ~1.5 million reviews up to November 2011. Each review includes ratings in terms of five "aspects": appearance, aroma, palate, taste, and overall impression. Reviews include product and user information, followed by each of these five ratings, and a plaintext review.

15 papers2 benchmarksGraphs

MUSDB18-HQ

MUSDB18-HQ is a high-quality version of the MUSDB18 music tracks dataset. The high-quality dataset consists of the same 150 songs, but instead of MP4 files (compressed with Advanced Audio Coding encoder at 256kbps, with bandwidth limited to 16kHz), the songs are provided as raw WAV files. Image Source: https://sigsep.github.io/datasets/musdb.html

15 papers10 benchmarksAudio

Syn2Real

Syn2Real, a synthetic-to-real visual domain adaptation benchmark meant to encourage further development of robust domain transfer methods. The goal is to train a model on a synthetic "source" domain and then update it so that its performance improves on a real "target" domain, without using any target annotations. It includes three tasks, illustrated in figures above: the more traditional closed-set classification task with a known set of categories; the less studied open-set classification task with unknown object categories in the target domain; and the object detection task, which involves localizing instances of objects by predicting their bounding boxes and corresponding class labels.

15 papers0 benchmarksImages

MALF (Multi-Attribute Labelled Faces)

The MALF dataset is a large dataset with 5,250 images annotated with multiple facial attributes and it is specifically constructed for fine grained evaluation.

15 papers0 benchmarksImages

Washington RGB-D

Washington RGB-D is a widely used testbed in the robotic community, consisting of 41,877 RGB-D images organized into 300 instances divided in 51 classes of common indoor objects (e.g. scissors, cereal box, keyboard etc). Each object instance was positioned on a turntable and captured from three different viewpoints while rotating.

15 papers0 benchmarksImages, RGB-D

MSRA Hand

MSRA Hands is a dataset for hand tracking. In total 6 subjects' right hands are captured using Intel's Creative Interactive Gesture Camera. Each subject is asked to make various rapid gestures in a 400-frame video sequence. To account for different hand sizes, a global hand model scale is specified for each subject: 1.1, 1.0, 0.9, 0.95, 1.1, 1.0 for subject 1~6, respectively. The camera intrinsic parameters are: principle point = image center(160, 120), focal length = 241.42. The depth image is 320x240, each .bin file stores the depth pixel values in row scanning order, which are 320240 floats. The unit is millimeters. The bin file is binary and needs to be opened with std::ios::binary flag. joint.txt file stores 400 frames x 21 hand joints per frame. Each line has 3 * 21 = 63 floats for 21 3D points in (x, y, z) coordinates. The 21 hand joints are: wrist, index_mcp, index_pip, index_dip, index_tip, middle_mcp, middle_pip, middle_dip, middle_tip, ring_mcp, ring_pip, ring_dip, ring_tip,

15 papers0 benchmarksImages

CASIA V2

CASIA V2 is a dataset for forgery classification. It contains 4795 images, 1701 authentic and 3274 forged.

15 papers0 benchmarksImages

ISIC 2017 Task 1

The ISIC 2017 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 1 challenge dataset for lesion segmentation contains 2,000 images for training with ground truth segmentations (2000 binary mask images).

15 papers0 benchmarksImages, Medical

MSK

The MSK dataset is a dataset for lesion recognition from the Memorial Sloan-Kettering Cancer Center. It is used as part of the ISIC lesion recognition challenges.

15 papers0 benchmarksImages, Medical

GTZAN

The gtzan8 audio dataset contains 1000 tracks of 30 second length. There are 10 genres, each containing 100 tracks which are all 22050Hz Mono 16-bit audio files in .wav format. The genres are:

15 papers3 benchmarksAudio

Countix

Countix is a real world dataset of repetition videos collected in the wild (i.e.YouTube) covering a wide range of semantic settings with significant challenges such as camera and object motion, diverse set of periods and counts, and changes in the speed of repeated actions. Countix include repeated videos of workout activities (squats, pull ups, battle rope training, exercising arm), dance moves (pirouetting, pumping fist), playing instruments (playing ukulele), using tools repeatedly (hammer hitting objects, chainsaw cutting wood, slicing onion), artistic performances (hula hooping, juggling soccer ball), sports (playing ping pong and tennis) and many others. Figure 6 illustrates some examples from the dataset as well as the distribution of repetition counts and period lengths.

15 papers4 benchmarksVideos

CrossNER

CrossNER is a cross-domain NER (Named Entity Recognition) dataset, a fully-labeled collection of NER data spanning over five diverse domains (Politics, Natural Science, Music, Literature, and Artificial Intelligence) with specialized entity categories for different domains. Additionally, CrossNER also includes unlabeled domain-related corpora for the corresponding five domains.

15 papers5 benchmarksTexts

4Seasons

4Seasons is adataset covering seasonal and challenging perceptual conditions for autonomous driving.

15 papers0 benchmarksImages

BVI-DVC

Contains 800 sequences at various spatial resolutions from 270p to 2160p and has been evaluated on ten existing network architectures for four different coding tools.

15 papers0 benchmarks

CED (Color Event Camera Dataset)

Contains 50 minutes of footage with both color frames and events. CED features a wide variety of indoor and outdoor scenes.

15 papers0 benchmarks

CMU DoG (CMU Document Grounded Conversations Dataset)

This is a document grounded dataset for text conversations. "Document Grounded Conversations" are conversations that are about the contents of a specified document. In this dataset the specified documents are Wikipedia articles about popular movies. The dataset contains 4112 conversations with an average of 21.43 turns per conversation.

15 papers0 benchmarksTexts
PreviousPage 121 of 1000Next