TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

EuroCity Persons

The EuroCity Persons dataset provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211,200).

6 papers0 benchmarksImages

FAT (Falling Things)

Falling Things (FAT) is a dataset for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. It consists of generated photorealistic images with accurate 3D pose annotations for all objects in 60k images.

6 papers0 benchmarks3D, Images

FDF (Flickr Diverse Faces)

A diverse dataset of human faces, including unconventional poses, occluded faces, and a vast variability in backgrounds.

6 papers0 benchmarks

FFHQ-Aging

FFHQ-Aging is a Dataset of human faces designed for benchmarking age transformation algorithms as well as many other possible vision tasks. This dataset is an extention of the NVIDIA FFHQ dataset, on top of the 70,000 original FFHQ images, it also contains the following information for each image: * Gender information (male/female with confidence score) * Age group information (10 classes with confidence score) * Head pose (pitch, roll & yaw) * Glasses type (none, normal or dark) * Eye occlusion score (0-100, different score for each eye) * Full semantic map (19 classes, based on CelebAMask-HQ labels)

6 papers0 benchmarksImages

Flightmare Simulator

Flightmare is composed of two main components: a configurable rendering engine built on Unity and a flexible physics engine for dynamics simulation. Those two components are totally decoupled and can run independently from each other. Flightmare comes with several desirable features: (i) a large multi-modal sensor suite, including an interface to extract the 3D point-cloud of the scene; (ii) an API for reinforcement learning which can simulate hundreds of quadrotors in parallel; and (iii) an integration with a virtual-reality headset for interaction with the simulated environment. Flightmare can be used for various applications, including path-planning, reinforcement learning, visual-inertial odometry, deep learning, human-robot interaction, etc.

6 papers0 benchmarks

HurricaneEmo

HurricaneEmo is an emotion dataset that contains 15,000 English tweets spanning three hurricanes: Harvey, Irma, and Maria.

6 papers0 benchmarksTexts

ICubWorld

iCubWorld datasets are collections of images recording the visual experience of iCub while observing objects in its typical environment, a laboratory or an office. The acquisition setting is devised to allow a natural human-robot interaction, where a teacher verbally provides the label of the object of interest and shows it to the robot, by holding it in the hand; the iCub can either track the object while the teacher moves it, or take it in its hand.

6 papers0 benchmarksImages

IIIT-AR-13K

IIIT-AR-13K is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports. This dataset contains a total of 13k annotated page images with objects in five different popular categories - table, figure, natural image, logo, and signature. It is the largest manually annotated dataset for graphical object detection.

6 papers0 benchmarksImages

IMPPRES

An IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference types.

6 papers0 benchmarks

Interview

A large-scale (105K conversations) media dialog dataset collected from news interview transcripts.

6 papers0 benchmarksSpeech

KINS

Augments the KITTI with more instance pixel-level annotation for 8 categories.

6 papers1 benchmarks

Malaria Dataset

The dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells.

6 papers4 benchmarks

MatterportLayout

MatterportLayout extends the Matterport3D dataset with general Manhattan layout annotations. It has 2,295 RGBD panoramic images from Matterport3D which are extended with ground truth 3D layouts.

6 papers0 benchmarksImages

mEBAL

A multimodal database for eye blink detection and attention level estimation.

6 papers0 benchmarks

Melinda

Introduces a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification. The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database, and the actual contents are extracted from papers associated with each of the records in the database.

6 papers0 benchmarks

Microsoft Research Social Media Conversation Corpus

Microsoft Research Social Media Conversation Corpus consists of 127M context-message-response triples from the Twitter FireHose, covering the 3-month period June 2012 through August 2012. Only those triples where context and response were generated by the same user were extracted. To minimize noise, only triples that contained at least one frequent bigram that appeared more than 3 times in the corpus was selected. This produced a corpus of 29M Twitter triples.

6 papers0 benchmarks

Mid-Air Dataset

Mid-Air, The Montefiore Institute Dataset of Aerial Images and Records, is a multi-purpose synthetic dataset for low altitude drone flights. It provides a large amount of synchronized data corresponding to flight records for multi-modal vision sensors and navigation sensors mounted on board of a flying quadcopter. Multi-modal vision sensors capture RGB pictures, relative surface normal orientation, depth, object semantics and stereo disparity.

6 papers12 benchmarks

MIDV-2019

Contains video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions.

6 papers0 benchmarksImages

MNIST-1D

A minimalist, low-memory, and low-compute alternative to classic deep learning benchmarks. The training examples are 20 times smaller than MNIST examples yet they differentiate more clearly between linear, nonlinear, and convolutional models which attain 32, 68, and 94% accuracy respectively (these models obtain 94, 99+, and 99+% on MNIST).

6 papers0 benchmarks

MOR-UAV

A large-scale video dataset for MOR in aerial videos.

6 papers0 benchmarks
PreviousPage 195 of 1000Next