TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Beijing Multi-Site Air-Quality Dataset

This data set includes hourly air pollutants data from 12 nationally-controlled air-quality monitoring sites. The air-quality data are from the Beijing Municipal Environmental Monitoring Center. The meteorological data in each air-quality site are matched with the nearest weather station from the China Meteorological Administration. The time period is from March 1st, 2013 to February 28th, 2017. Missing data are denoted as NA.

11 papers2 benchmarks

Wine (Wine Data Set)

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

11 papers10 benchmarksImages

WiderPerson

WiderPerson contains a total of 13,382 images with 399,786 annotations, i.e., 29.87 annotations per image, which means this dataset contains dense pedestrians with various kinds of occlusions. Hence, pedestrians in the proposed dataset are extremely challenging due to large variations in the scenario and occlusion, which is suitable to evaluate pedestrian detectors in the wild.

11 papers10 benchmarksImages

Amazon Photo

Amazon Photo

11 papers1 benchmarks

VoxForge

VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). Image Source: http://www.voxforge.org/home

11 papers5 benchmarksAudio, Speech, Texts

Mario AI

Mario AI was a benchmark environment for reinforcement learning. The gameplay in Mario AI, as in the original Nintendo’s version, consists in moving the controlled character, namely Mario, through two-dimensional levels, which are viewed sideways. Mario can walk and run to the right and left, jump, and (depending on which state he is in) shoot fireballs. Gravity acts on Mario, making it necessary to jump over cliffs to get past them. Mario can be in one of three states: Small, Big (can kill enemies by jumping onto them), and Fire (can shoot fireballs).

11 papers0 benchmarksEnvironment

DARPA

Darpa is a dataset consisting of communications between source IPs and destination IPs. This dataset contains different attacks between IPs.

11 papers0 benchmarks

TUM monoVO

TUM monoVO is a dataset for evaluating the tracking accuracy of monocular Visual Odometry (VO) and SLAM methods. It contains 50 real-world sequences comprising over 100 minutes of video, recorded across different environments – ranging from narrow indoor corridors to wide outdoor scenes. All sequences contain mostly exploring camera motion, starting and ending at the same position: this allows to evaluate tracking accuracy via the accumulated drift from start to end, without requiring ground-truth for the full sequence. In contrast to existing datasets, all sequences are photometrically calibrated: the dataset creators provide the exposure times for each frame as reported by the sensor, the camera response function and the lens attenuation factors (vignetting).

11 papers0 benchmarksImages

DCASE 2013

DCASE 2013 is a dataset for sound event detection. It consists of audio-only recordings where individual sound events are prominent in an acoustic scene.

11 papers0 benchmarksAudio

ISIC 2018 Task 3

The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 3 dataset is the challenge on lesion classification. It includes 2594 images. The task is to classify the dermoscopic images into one of the following categories: melanoma, melanocytic nevus, basal cell carcinoma, actinic keratosis / Bowen’s disease, benign keratosis, dermatofibroma, and vascular lesion.

11 papers0 benchmarksImages, Medical

BCN_20000

BCN_20000 is a dataset composed of 19,424 dermoscopic images of skin lesions captured from 2010 to 2016 in the facilities of the Hospital Clínic in Barcelona. The dataset can be used for lesion recognition tasks such as lesion segmentation, lesion detection and lesion classification.

11 papers0 benchmarksImages, Medical

Open PI

Open PI is the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. The dataset comprises 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. The state tracking task assumes new formulation in which just the text is provided, from which a set of state changes (entity, attribute, before, after) is generated for each step, where the entity, attribute, and values must all be predicted from an open vocabulary.

11 papers0 benchmarksTexts

VLEP (Video-and-Language Event Prediction)

VLEP contains 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. Each example (see Figure 1) consists of a Premise Event (a short video clip with dialogue), a Premise Summary (a text summary of the premise event), and two potential natural language Future Events (along with Rationales) written by people. These clips are on average 6.1 seconds long and are harvested from diverse event-rich sources, i.e., TV show and YouTube Lifestyle Vlog videos.

11 papers1 benchmarksTexts, Videos

Video2GIF

The Video2GIF dataset contains over 100,000 pairs of GIFs and their source videos. The GIFs were collected from two popular GIF websites (makeagif.com, gifsoup.com) and the corresponding source videos were collected from YouTube in Summer 2015. IDs and URLs of the GIFs and the videos are provided, along with temporal alignment of GIF segments to their source videos. The dataset shall be used to evaluate GIF creation and video highlight techniques.

11 papers0 benchmarksVideos

TaxiNLI

TaxiNLI is a dataset collected based on the principles and categorizations of the aforementioned taxonomy. A subset of examples are curated from MultiNLI (Williams et al., 2018) by sampling uniformly based on the entailment label and the domain. The dataset is annotated with finegrained category labels.

11 papers0 benchmarksTexts

GSL (Greek Sign Language)

Dataset Description The Greek Sign Language (GSL) is a large-scale RGB+D dataset, suitable for Sign Language Recognition (SLR) and Sign Language Translation (SLT). The video captures are conducted using an Intel RealSense D435 RGB+D camera at a rate of 30 fps. Both the RGB and the depth streams are acquired in the same spatial resolution of 848×480 pixels. To increase variability in the videos, the camera position and orientation is slightly altered within subsequent recordings. Seven different signers are employed to perform 5 individual and commonly met scenarios in different public services. The average length of each scenario is twenty sentences.

11 papers1 benchmarksRGB-D, Videos

MSD (Million Song Dataset)

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

11 papers0 benchmarksAudio, Images

aquamuse

5,519 query-based summaries, each associated with an average of 6 input documents selected from an index of 355M documents from Common Crawl.

11 papers0 benchmarks

CATS (Color and Thermal Stereo Benchmark)

A dataset consisting of stereo thermal, stereo color, and cross-modality image pairs with high accuracy ground truth (< 2mm) generated from a LiDAR. The authors scanned 100 cluttered indoor and 80 outdoor scenes featuring challenging environments and conditions. CATS contains approximately 1400 images of pedestrians, vehicles, electronics, and other thermally interesting objects in different environmental conditions, including nighttime, daytime, and foggy scenes.

11 papers0 benchmarks

ClariQ

ClariQ is an extension of the Qulac dataset with additional new topics, questions, and answers in the training set. The test set is completely unseen and newly collected. Like Qulac, ClariQ consists of single-turn conversations (initial_request, followed by clarifying question and answer). In addition, it comes with synthetic multi-turn conversations (up to three turns). ClariQ features approximately 18K single-turn conversations, as well as 1.8 million multi-turn conversations.

11 papers0 benchmarksTexts
PreviousPage 143 of 1000Next