Datasets

19,997 machine learning datasets

19,997 dataset results

BCOPA-CE (A Balanced COPA Test Set with cause-effect as alternatives)

We provide the BCOPA-CE test set, which has balanced token distribution in the correct and wrong alternatives and increases the difficulty of being aware of cause and effect.

3 papers0 benchmarksTexts

Voice Conversion Challenge 2018

Voice conversion (VC) is a technique to transform a speaker identity included in a source speech waveform into a different one while preserving linguistic information of the source speech waveform. The Voice Conversion Challenge (VCC) 2016 was launched in 2016 at Interspeech 2016. The objective of the 2016 challenge was to better understand different VC techniques built on a freely-available common dataset to look at a common goal, and to share views about unsolved problems and challenges faced by the current VC techniques. The VCC 2016 focused on the most basic VC task, that is, the construction of VC models that automatically transform the voice identity of a source speaker into that of a target speaker using a parallel clean training database where source and target speakers read out the same set of utterances in a professional recording studio. 17 research groups had participated in the 2016 challenge. The challenge was successful and it established new standard evaluation methodol

3 papers0 benchmarksAudio

PackIt

The ability to jointly understand the geometry of objects and plan actions for manipulating them is crucial for intelligent agents. This ability is referred to as geometric planning. Recently, many interactive environments have been proposed to evaluate intelligent agents on various skills, however, none of them cater to the needs of geometric planning. PackIt is a virtual environment to evaluate and potentially learn the ability to do geometric planning, where an agent needs to take a sequence of actions to pack a set of objects into a box with limited space.

3 papers1 benchmarks3D

AxonEM

The AxonEM dataset consists of two 30x30x30 um^3 EM image volumes from the human and mouse cortex, respectively. It is used for 3D axon instance segmentation of brain cortical regions. The authors proofread over 18,000 axon instances to provide dense 3D axon instance segmentation, enabling large-scale evaluation of axon reconstruction methods. In addition, the authors also densely annotate nine ground truth subvolumes for training, per each data volume.

3 papers0 benchmarksMedical

Giantsteps

Giantsteps is a dataset that includes songs in major and minor scales for all pitch classes, i.e., a 24-way classification task.

3 papers0 benchmarksMusic

Forms Dataset

The Forms Dataset is a dataset for document structure extraction comprising of 5K forms.

3 papers0 benchmarksImages

JS Fake Chorales

A MIDI dataset of 500 4-part chorales generated by the KS_Chorus algorithm, annotated with results from hundreds of listening test participants, with 500 further unannotated chorales.

3 papers0 benchmarksMidi, Music, Tabular

Spectrum Challange 2 Dataset

The dataset is approved for public release, distribution unlimited.

3 papers0 benchmarksTime series

Vehicle-Rear

Vehicle-Rear is a novel dataset for vehicle identification that contains more than three hours of high-resolution videos, with accurate information about the make, model, color and year of nearly 3,000 vehicles, in addition to the position and identification of their license plates.

3 papers0 benchmarksImages, Videos

Action-Camera Parking

The Action-Camera Parking Dataset contains 293 images captured at a roughly 10-meter height using a GoPro Hero 6 camera. It can be used for training machine learning models that perform image-based parking space occupancy classification.

3 papers2 benchmarksImages

How Do I Login McAfee Antivirus Account?: A Complete Guide

(Toll Free) Number +1-341-900-3252

3 papers0 benchmarks

Navigation Turing Test

Replay data from human players and AI agents navigating in a 3D game environment.

3 papers0 benchmarksImages, Replay data, Videos

Deezer User Networks

The data was collected from the music streaming service Deezer (November 2017). These datasets represent friendship networks of users from 3 European countries. Nodes represent the users and edges are the mutual friendships. We reindexed the nodes in order to achieve a certain level of anonimity. The csv files contain the edges -- nodes are indexed from 0. The json files contain the genre preferences of users -- each key is a user id, the genres loved are given as lists. Genre notations are consistent across users. In each dataset users could like 84 distinct genres. Liked genre lists were compiled based on the liked song lists. The countries included are Romania, Croatia and Hungary. For each dataset we listed the number of nodes an edges.

3 papers0 benchmarksGraphs

EmailSum (Email Thread Summarization)

Email Thread Summarization (EmailSum) is a dataset which contains human-annotated short (<30 words) and long (<100 words) summaries of 2,549 email threads (each containing 3 to 10 emails) over a wide variety of topics. It was developed to spur research in thread summarization.

3 papers0 benchmarksTexts

Anime Drawings Dataset

A dataset for 2D pose estimation of anime/manga images.

3 papers0 benchmarksImages

HiRID

HiRID is a freely accessible critical care dataset containing data relating to almost 34 thousand patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed unit admitting >6,500 patients per year. The ICU offers the full range of modern interdisciplinary intensive care medicine for adult patients. The dataset was developed in cooperation between the Swiss Federal Institute of Technology (ETH) Zürich, Switzerland and the ICU.

3 papers11 benchmarksMedical, Time series

Q-Pain

Q-Pain, a dataset for assessing bias in medical QA in the context of pain management, one of the most challenging forms of clinical decision-making.

3 papers0 benchmarksTexts

DeliData

DeliData is the first publicly available dataset containing collaborative conversations on solving a cognitive task, consisting of 500 group dialogues and 14k utterances.

3 papers1 benchmarks

IPAC (Icelandic Parallel Abstracts Corpus)

IPAC (Icelandic Parallel Abstracts Corpus ) is a new Icelandic-English parallel corpus, composed of abstracts from student theses and dissertations. The texts were collected from the Skemman repository which keeps records of all theses, dissertations and final projects from students at Icelandic universities. The corpus was aligned based on sentence-level BLEU scores, in both translation directions, from NMT models using Bleualign. The result is a corpus of 64k sentence pairs from over 6 thousand parallel abstracts.

3 papers0 benchmarksTexts

FoodLogoDet-1500

FoodLogoDet-1500 is a new large-scale publicly available food logo dataset, which has 1,500 categories, about 100,000 images and about 150,000 manually annotated food logo objects.

3 papers0 benchmarksImages

PreviousPage 270 of 1000Next