TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Dex-Net 2.0 (Dexterity Network 2.0)

Dex-Net 2.0 is a dataset associating 6.7 million point clouds and analytic grasp quality metrics with parallel-jaw grasps planned using robust quasi-static GWS analysis on a dataset of 1,500 3D object models.

30 papers0 benchmarksPoint cloud

VALUE (Video-And-Language Understanding Evaluation)

VALUE is a Video-And-Language Understanding Evaluation benchmark to test models that are generalizable to diverse tasks, domains, and datasets. It is an assemblage of 11 VidL (video-and-language) datasets over 3 popular tasks: (i) text-to-video retrieval; (ii) video question answering; and (iii) video captioning. VALUE benchmark aims to cover a broad range of video genres, video lengths, data volumes, and task difficulty levels. Rather than focusing on single-channel videos with visual information only, VALUE promotes models that leverage information from both video frames and their associated subtitles, as well as models that share knowledge across multiple tasks.

30 papers0 benchmarksTexts, Videos

I-RAVEN (Impartial-RAVEN)

To fix the defacts of RAVEN dataset, we generate an alternative answer set for each RPM question in RAVEN, forming an improved dataset named Impartial-RAVEN (I-RAVEN for short).

30 papers0 benchmarks

CREAK

A testbed for commonsense reasoning about entity knowledge, bridging fact-checking about entities with commonsense inferences.

30 papers0 benchmarksTexts

EntityQuestions

EntityQuestions is a dataset of simple, entity-rich questions based on facts from Wikidata (e.g., "Where was Arve Furset born? ").

30 papers1 benchmarksTexts

URLB (Unsupervised Reinforcement Learning Benchmark)

URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, it provides twelve continuous control tasks from three domains for evaluation.

30 papers0 benchmarks

BRACS (BReAst Carcinoma Subtyping)

BReAst Carcinoma Subtyping (BRACS) dataset, a large cohort of annotated Hematoxylin & Eosin (H&E)-stained images to facilitate the characterization of breast lesions. BRACS contains 547 Whole-Slide Images (WSIs), and 4539 Regions of Interest (ROIs) extracted from the WSIs. Each WSI, and respective ROIs, are annotated by the consensus of three board-certified pathologists into different lesion categories. Specifically, BRACS includes three lesion types, i.e., benign, malignant and atypical, which are further subtyped into seven categories.

30 papers0 benchmarksImages, Medical

VocalSet (VocalSet: A Singing Voice Dataset)

VocalSet is a a singing voice dataset consisting of 10.1 hours of monophonic recorded audio of professional singers demonstrating both standard and extended vocal techniques on all 5 vowels. Existing singing voice datasets aim to capture a focused subset of singing voice characteristics, and generally consist of just a few singers. VocalSet contains recordings from 20 different singers (9 male, 11 female) and a range of voice types. VocalSet aims to improve the state of existing singing voice datasets and singing voice research by capturing not only a range of vowels, but also a diverse set of voices on many different vocal techniques, sung in contexts of scales, arpeggios, long tones, and excerpts.

30 papers2 benchmarksAudio

SMACv2

SMACv2 (StarCraft Multi-Agent Challenge v2) is a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation.

30 papers0 benchmarksEnvironment

KITTI-C

🤖 Robo3D - The KITTI-C Benchmark KITTI-C is an evaluation benchmark heading toward robust and reliable 3D object detection in autonomous driving. With it, we probe the robustness of 3D detectors under out-of-distribution (OoD) scenarios against corruptions that occur in the real-world environment. Specifically, we consider natural corruptions happen in the following cases:

30 papers6 benchmarksImages, Point cloud

VideoInstruct (Video Instruction Dataset)

Video Instruction Dataset is used to train Video-ChatGPT. It consists of 100,000 high-quality video instruction pairs. employs a combination of human-assisted and semi-automatic annotation techniques, aiming to produce high-quality video instruction data. These methods create question-answer pairs related to

30 papers31 benchmarksTexts, Videos

RT-GENE

Presents a diverse eye-gaze dataset.

29 papers1 benchmarks

Comic2k

Comic2k is a dataset used for cross-domain object detection which contains 2k comic images with image and instance-level annotations. Image Source: https://naoto0804.github.io/cross_domain_detection/

29 papers16 benchmarksImages

CODAH (COmmonsense Dataset Adversarially-authored by Humans)

The COmmonsense Dataset Adversarially-authored by Humans (CODAH) is an evaluation set for commonsense question-answering in the sentence completion style of SWAG. As opposed to other automatically generated NLI datasets, CODAH is adversarially constructed by humans who can view feedback from a pre-trained model and use this information to design challenging commonsense questions. It contains 2801 questions in total, and uses 5-fold cross validation for evaluation.

29 papers2 benchmarksTexts

HyperLex

A dataset and evaluation resource that quantifies the extent of of the semantic category membership, that is, type-of relation also known as hyponymy-hypernymy or lexical entailment (LE) relation between 2,616 concept pairs.

29 papers0 benchmarks

NVGesture

The NVGesture dataset focuses on touchless driver controlling. It contains 1532 dynamic gestures fallen into 25 classes. It includes 1050 samples for training and 482 for testing. The videos are recorded with three modalities (RGB, depth, and infrared).

29 papers2 benchmarksImages, Videos

OCID (Object Clutter Indoor Dataset)

Developing robot perception systems for handling objects in the real-world requires computer vision algorithms to be carefully scrutinized with respect to the expected operating domain. This demands large quantities of ground truth data to rigorously evaluate the performance of algorithms.

29 papers1 benchmarksRGB-D

AVD (Active Vision Dataset)

AVD focuses on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes.

29 papers4 benchmarksImages, RGB-D

WI-LOCNESS (Cambridge English Write & Improve & LOCNESS)

WI-LOCNESS is part of the Building Educational Applications 2019 Shared Task for Grammatical Error Correction. It consists of two datasets:

29 papers1 benchmarksTexts

Molweni

A machine reading comprehension (MRC) dataset with discourse structure built over multiparty dialog. Molweni's source samples from the Ubuntu Chat Corpus, including 10,000 dialogs comprising 88,303 utterances.

29 papers4 benchmarks
PreviousPage 81 of 1000Next