Datasets

19,997 machine learning datasets

19,997 dataset results

Twitter Death Hoaxes

This is a dataset for detection fake death hoaxes. It consists of of death reports collected from Twitter between 1st January, 2012 and 31st December, 2014. It was collected by tracking the keyword 'RIP', and matching those tweets in which a name is mentioned next to RIP. Matching names were identified by using Wikidata as a database of names.

1 papers0 benchmarks

KACC

The KACC benchmark consists of three subtasks that can be applied to knowledge graphs: knowledge abstraction, knowledge concretization and knowledge completion.

1 papers0 benchmarksGraphs

AllMusic Mood Subset

The AllMusic Mood Subset (AMS) is a dataset for mood classification from songs. It is created by matching a subset of the Million Song Dataset (MSD), totalling 67k tracks, with expert annotations of 188 different moods collected from AllMusic.

1 papers0 benchmarksAudio

EDUVSUM (Educational Video Summarization)

EDUVSUM contains educational videos with subtitles from three popular e-learning platforms: Edx,YouTube, and TIB AV-Portal that cover the following topics: crash course on history of science and engineering, computer science, python and web programming, machine learning and computer vision, Internet of things (IoT), and software engineering. In total, the current version of the dataset contains 98 videos with ground truth values annotated by a user with an academic background in computer science.

1 papers0 benchmarksVideos

Short Text Font Dataset

The proposed dataset includes 1,309 short text instances from Adobe Spark. The dataset is a collection of publicly available sample texts created by different designers. It covers a variety of topics found in posters, flyers, motivational quotes and advertisements.

1 papers0 benchmarksImages, Texts

SPHERE-calorie

The dataset contains both RGB and depth images, and the data from two accelerometers, together with ground truth calorie values from a calorimeter for calorie expenditure estimation in home environments.

1 papers0 benchmarksImages, RGB-D, Time series

ErhuPT (Erhu Playing Technique Dataset)

This dataset is an audio dataset containing about 1500 audio clips recorded by multiple professional players.

1 papers0 benchmarksAudio

OSTD (Open-Source-Total-Distance)

This dataset consists of 18 movies with duration range between 10 and 104 minutes leveraged from the OVSD dataset (Rotman et al., 2016). For these videos, the summary length limit is set to be the minimum between 4 minutes and 10% of the video length.

1 papers0 benchmarksVideos

PolarRR

PolarRR is a new dataset with more than 100 types of glass in which obtained transmission images are perfectly aligned with input mixed images.

1 papers0 benchmarksImages

PVDN (Provident Vehicle Detection at Night)

PVDN is a dataset of vehicle detection at night, using light reflections caused by their headlamps. It contains 59,746 annotated grayscale images out of 346 different scenes in a rural environment at night. In these images, all oncoming vehicles, their corresponding light objects (e. g., headlamps), and their respective light reflections (e. g., light reflections on guardrails) are labeled. With this information, this dataset enables research into new methods of detecting oncoming vehicles based on the light reflections they cause, long before they are directly visible.

1 papers0 benchmarksImages

WordNet-feelings

WordNet-feelings, is an affective dataset that identifies 3664 word senses as feelings, and associates each of these with one of the 9 categories of feeling. The 9 different categories are: Actions, Anger, Attention, Attraction, Hedonics, Other, Physiological, Social, Wellbeing.

1 papers0 benchmarksTexts

Doc3DShade

Doc3DShade extends Doc3D with realistic lighting and shading. Follows a similar synthetic rendering procedure using captured document 3D shapes but final image generation step combines real shading of different types of paper materials under numerous illumination conditions.

1 papers0 benchmarksImages

TurkQA

TurkQA consists of a selection of sentences from English Wikipedia articles, with questions and answers crowdsourced from workers on Amazon Mechanical Turk.

1 papers0 benchmarksTexts

Dialog-based Language Learning dataset

Dialog-based Language Learning dataset is designed to measure how well models can perform at learning as a student given a teacher’s textual responses to the student’s answer (as well as potentially receiving an external real-valued reward signal).

1 papers0 benchmarksTexts

WikiSuggest

To collect WikiSuggest, Google Suggest API is used to harvest natural language questions and submit them to Google Search. Whenever Google Search returns a box with a short answer from Wikipedia, an example from the question, answer, and the Wikipedia document are created. If the answer string is missing from the document this often implies a spurious question-answer pair, such as (‘what time is half time in rugby’, ‘80 minutes, 40 minutes’). Question-answer pairs without the exact answer string are pruned. Fifty examples after filtering are examined and 54% were found to be well-formed question-answer pairs where answers in the document can be grounded, 20% contained answers without textual evidence in the document (the answer string exists in an irreleveant context), and 26% contain incorrect QA pairs.

1 papers0 benchmarksTexts

MessyTable

MessyTable features a large number of scenes with messy tables captured from multiple camera views. Each scene in this dataset is highly complex, containing multiple object instances that could be identical, stacked and occluded by other instances. The key challenge is to associate all instances given the RGB image of all views. The seemingly simple task surprisingly fails many popular methods or heuristics. The dataset challenges existing methods in mining subtle appearance differences, reasoning based on contexts, and fusing appearance with geometric cues for establishing an association.

1 papers0 benchmarksImages

MSRA-B

The MSRA-B dataset is a dataset for salient object detection. It contains 5,000 images with a variety of image contents. Most of the images have a single salient object. There is a large variation among images including natural scenes, animals, indoor, outdoor, etc.

1 papers0 benchmarksImages

VOT2015 (Visual Object Tracking Challenge 2015)

VOT2015 is a visual object tracking dataset. The dataset comprises 60 short sequences showing various objects in challenging backgrounds. The sequences were chosen from a large pool of sequences from different sources.

1 papers0 benchmarksVideos

UCF50

UCF50 is an action recognition data set with 50 action categories, consisting of realistic videos taken from youtube. This data set is an extension of YouTube Action data set (UCF11) which has 11 action categories.

1 papers0 benchmarksVideos

Placepedia

Placepedia contains 240K places with 35M images from all over the world. Each place is associated with its district, city/town/village, state/province, country, continent, and a large amount of diverse photos. Both administrative areas and places have rich side information, e.g. discription, population, category, function. In addition, two cleaned subsets (Places-Coarse and Places-Fine) for experiments are provided.

1 papers0 benchmarksImages

PreviousPage 365 of 1000Next