TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Country211

Country211 is a dataset released by OpenAI, designed to assess the geolocation capability of visual representations. It filters the YFCC100m dataset (Thomee et al., 2016) to find 211 countries (defined as having an ISO-3166 country code) that have at least 300 photos with GPS coordinates. OpenAI built a balanced dataset with 211 categories, by sampling 200 photos for training and 100 photos for testing, for each country.

3 papers2 benchmarksImages

SAVEE (Surrey Audio-Visual Expressed Emotion)

The Surrey Audio-Visual Expressed Emotion (SAVEE) dataset was recorded as a pre-requisite for the development of an automatic emotion recognition system. The database consists of recordings from 4 male actors in 7 different emotions, 480 British English utterances in total. The sentences were chosen from the standard TIMIT corpus and phonetically-balanced for each emotion. The data were recorded in a visual media lab with high quality audio-visual equipment, processed and labeled. To check the quality of performance, the recordings were evaluated by 10 subjects under audio, visual and audio-visual conditions. Classification systems were built using standard features and classifiers for each of the audio, visual and audio-visual modalities, and speaker-independent recognition rates of 61%, 65% and 84% achieved respectively.

3 papers6 benchmarksAudio

FSDD (Free Spoken Digit Dataset)

Free Spoken Digit Dataset (FSDD) is a simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends. It contains data from 6 speakers, 3,000 recordings (50 of each digit per speaker), and English pronunciations.

3 papers0 benchmarksAudio

M-VAD Names (M-VAD Names Dataset)

The dataset contains the annotations of characters' visual appearances, in the form of tracks of face bounding boxes, and the associations with characters' textual mentions, when available. The detection and annotation of the visual appearances of characters in each video clip of each movie was achieved through a semi-automatic approach. The released dataset contains more than 24k annotated video clips, including 63k visual tracks and 34k textual mentions, all associated with their character identities.

3 papers1 benchmarksTexts, Videos

PyBullet

PyBullet is an easy to use Python module for physics simulation, robotics and deep reinforcement learning based on the Bullet Physics SDK. With PyBullet you can load articulated bodies from URDF, SDF and other file formats. PyBullet provides forward dynamics simulation, inverse dynamics computation, forward and inverse kinematics and collision detection and ray intersection queries. Aside from physics simulation, PyBullet supports to rendering, with a CPU renderer and OpenGL visualization and support for virtual reality headsets.

3 papers0 benchmarks

FOBIE (Focused Open Biological Information Extraction)

The Focused Open Biology Information Extraction (FOBIE) dataset aims to support IE from Computer-Aided Biomimetics. The dataset contains ~1,500 sentences from scientific biological texts. These sentences are annotated with TRADE-OFFS and syntactically similar relations between unbounded arguments, as well as argument-modifiers.

3 papers0 benchmarksBiology, Texts

COVID-Q

COVID-Q consists of COVID-19 questions which have been annotated into a broad category (e.g. Transmission, Prevention) and a more specific class such that questions in the same class are all asking the same thing.

3 papers0 benchmarksTexts

ClarQ

ClarQ, consists of ∼2M examples distributed across 173 domains of stackexchange. This dataset is meant for training and evaluation of Clarification Question Generation Systems.

3 papers0 benchmarksTexts

MineRL

MineRLis an imitation learning dataset with over 60 million frames of recorded human player data. The dataset includes a set of tasks which highlights many of the hardest problems in modern-day Reinforcement Learning: sparse rewards and hierarchical policies.

3 papers0 benchmarksEnvironment

GitHub Typo Corpus

Are you the kind of person who makes a lot of typos when writing code? Or are you the one who fixes them by making "fix typo" commits? Either way, thank you—you contributed to the state-of-the-art in the NLP field.

3 papers0 benchmarksTexts

ChrEn (Cherokee-English Parallel Dataset)

Cherokee-English Parallel Dataset is a low-resource dataset of 14,151 pairs of sentences with around 313K English tokens and 206K Cherokee tokens. The parallel corpus is accompanied by a monolingual Cherokee dataset of 5,120 sentences. Both datasets are mostly derived from Cherokee monolingual books.

3 papers0 benchmarksParallel

MMDB (Multimodal Dyadic Behavior)

Multimodal Dyadic Behavior (MMDB) dataset is a unique collection of multimodal (video, audio, and physiological) recordings of the social and communicative behavior of toddlers. The MMDB contains 160 sessions of 3-5 minute semi-structured play interaction between a trained adult examiner and a child between the age of 15 and 30 months. The MMDB dataset supports a novel problem domain for activity recognition, which consists of the decoding of dyadic social interactions between adults and children in a developmental context.

3 papers0 benchmarksAudio, Videos

AFEW-VA (AFEW-VA Database for Valence and Arousal Estimation In-The-Wild)

The AFEW-VA databaset is a collection of highly accurate per-frame annotations levels of valence and arousal, along with per-frame annotations of 68 facial landmarks for 600 challenging video clips. These clips are extracted from feature films and were also annotated in terms of discrete emotion categories in the form of the AFEW database (that can be obtained there).

3 papers0 benchmarksVideos

ORVS (Online Retinal image for Vessel Segmentation (ORVS))

The ORVS dataset has been newly established as a collaboration between the computer science and visual-science departments at the University of Calgary.

3 papers0 benchmarksImages, Medical

OCTAGON (OCTAGON Dataset)

The OCTAGON dataset is a set of Angiography by Octical Coherence Tomography images (OCT-A) used to the segmentation of the Foveal Avascular Zone (FAZ). The dataset includes 144 healthy OCT-A images and 69 diabetic OCT-A images, divided into four groups, each one with 36 and about 17 OCT-A images, respectively. These groups are: 3x3 superficial, 3x3 deep, 6x6 superficial and 6x6 deep, where 3x3 and 6x6 are the zoom of the image and superficial/deep are the depth level of the extracted image. The healthy dataset includes OCT-A images from people classified in 6 age ranges: 10-19 years, 20-29 years, 30-39 years, 40-49 years, 50-59 years and 60-69 years. Each age range includes 3 different patients with information of left and right eyes for each one. Finally, for each eye, there are four different images: one 3x3 superficial image, one 3x3 deep image, one 6x6 superficial image and one 6x6 deep image. Each image have two manual labelled of expert clinicians of the FAZ and their quantificat

3 papers0 benchmarksImages, Medical

UNDD (Urban Night Driving Dataset)

UNDD consists of 7125 unlabelled day and night images; additionally, it has 75 night images with pixel-level annotations having classes equivalent to Cityscapes dataset.

3 papers0 benchmarksImages

Deep Fakes Dataset (inamibora)

The Deep Fakes Dataset is a collection of "in the wild" portrait videos for deepfake detection. The videos in the dataset are diverse real-world samples in terms of the source generative model, resolution, compression, illumination, aspect-ratio, frame rate, motion, pose, cosmetics, occlusion, content, and context. They originate from various sources such as news articles, forums, apps, and research presentations; totalling up to 142 videos, 32 minutes, and 17 GBs. Synthetic videos are matched with their original counterparts when possible.

3 papers0 benchmarksVideos

Shmoop Corpus

Shmoop Corpus is a dataset of 231 stories that are paired with detailed multi-paragraph summaries for each individual chapter (7,234 chapters), where the summary is chronologically aligned with respect to the story chapter. From the corpus, a set of common NLP tasks are constructed, including Cloze-form question answering and a simplified form of abstractive summarization, as benchmarks for reading comprehension on stories.

3 papers0 benchmarksTexts

3DMAD (3D Mask Attack Dataset)

The 3D Mask Attack Database (3DMAD) is a biometric (face) spoofing database. It currently contains 76500 frames of 17 persons, recorded using Kinect for both real-access and spoofing attacks. Each frame consists of:

3 papers0 benchmarks

Allegro Reviews

A comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing datasets for named entity recognition, question-answering, textual entailment, and others.

3 papers0 benchmarks
PreviousPage 259 of 1000Next