Datasets

19,997 machine learning datasets

19,997 dataset results

OpoSum

OPOSUM is a dataset for the training and evaluation of Opinion Summarization models which contains Amazon reviews from six product domains: Laptop Bags, Bluetooth Headsets, Boots, Keyboards, Televisions, and Vacuums. The six training collections were created by downsampling from the Amazon Product Dataset introduced in McAuley et al. (2015) and contain reviews and their respective ratings.

7 papers0 benchmarksTexts

ForecastQA

ForecastQA is a question-answering dataset consisting of 10,392 event forecasting questions, which have been collected and verified via crowdsourcing efforts. The forecasting problem for this dataset is formulated as a restricted-domain, multiple-choice, question-answering (QA) task that simulates the forecasting scenario.

7 papers0 benchmarks

MSSD (Music Streaming Sessions Dataset)

The Spotify Music Streaming Sessions Dataset (MSSD) consists of 160 million streaming sessions with associated user interactions, audio features and metadata describing the tracks streamed during the sessions, and snapshots of the playlists listened to during the sessions.

7 papers6 benchmarksAudio

CASME II (Chinese Academy of Sciences Micro-Expression II)

The Chinese Academy of Sciences Micro-Expression dataset (CASME II) consists of 255 videos, elicited from 26 participants. The videos are recorded using Point Gray GRAS-03K2C camera which has a frame rate of 200fps. The average video length is 0.34s, equivalent to 68 frames. Each video’s emotion label is annotated by two coders, where the reliability is 0.846.

7 papers12 benchmarksImages

Atari Grand Challenge

The Atari Grand Challenge dataset is a large dataset of human Atari 2600 replays. It consists of replays for 5 different games: * Space Invaders (445 episodes, 2M frames) * Q*bert (659 episodes, 1.6M frames) * Ms.Pacman (384 episodes, 1.7M frames) * Video Pinball (211 episodes, 1.5M frames) * Montezuma’s revenge (668 episodes, 2.7M frames)

7 papers0 benchmarksImages, Videos

BanFakeNews

An annotated dataset of ~50K news that can be used for building automated fake news detection systems for a low resource language like Bangla.

7 papers0 benchmarks

Book Cover Dataset

A new challenging dataset that can be used for many pattern recognition tasks.

7 papers1 benchmarks

COG

A configurable visual question and answer dataset (COG) to parallel experiments in humans and animals. COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory -- problems that remain challenging for modern deep learning architectures.

7 papers0 benchmarks

DDD20 (DAVIS Driving Dataset 2020)

The dataset was captured with a DAVIS camera that concurrently streams both dynamic vision sensor (DVS) brightness change events and active pixel sensor (APS) intensity frames. DDD20 is the longest event camera end-to-end driving dataset to date with 51h of DAVIS event+frame camera and vehicle human control data collected from 4000km of highway and urban driving under a variety of lighting conditions.

7 papers0 benchmarks

Definite Pronoun Resolution Dataset

Composes sentence pairs (i.e., twin sentences).

7 papers0 benchmarks

DFW (Disguised Faces in the Wild)

Contains over 11000 images of 1000 identities with different types of disguise accessories. The dataset is collected from the Internet, resulting in unconstrained face images similar to real world settings.

7 papers0 benchmarksImages

EYTH (EgoYouTubeHands)

Includes egocentric videos containing hands in the wild.

7 papers0 benchmarks

FocusPath

FocusPath is a dataset compiled from diverse Whole Slide Image (WSI) scans in different focus (z-) levels. Images are naturally blurred by out-of-focus lens provided with GT scores of focus levels. The dataset can be used for No-Reference Focus Quality assessment of Digital Pathology/Microscopy images.

7 papers0 benchmarks

Ford AV Dataset

A challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18.

7 papers0 benchmarks

FPL (First-Person Locomotion)

Supports new task that predicts future locations of people observed in first-person videos.

7 papers0 benchmarksImages

Grocery Store

Grocery Store is a dataset of natural images of grocery items. All natural images were taken with a smartphone camera in different grocery stores. It contains 5,125 natural images from 81 different classes of fruits, vegetables, and carton items (e.g. juice, milk, yoghurt). The 81 classes are divided into 42 coarse-grained classes, where e.g. the fine-grained classes 'Royal Gala' and 'Granny Smith' belong to the same coarse-grained class 'Apple'. Additionally, each fine-grained class has an associated iconic image and a product description of the item.

7 papers0 benchmarksImages

Gumar Corpus

A large-scale corpus of Gulf Arabic consisting of 110 million words from 1,200 forum novels.

7 papers0 benchmarksTexts

HandNet

The HandNet dataset contains depth images of 10 participants' hands non-rigidly deforming in front of a RealSense RGB-D camera. The annotations are generated by a magnetic annotation technique. 6D pose is available for the center of the hand as well as the five fingertips (i.e. position and orientation of each).

7 papers0 benchmarksImages, RGB-D

Hindi Visual Genome

Hindi Visual Genome is a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research.

7 papers0 benchmarksImages, Texts

Hotels-50K

The Hotels-50K dataset consists of over 1 million images from 50,000 different hotels around the world. These images come from both travel websites, as well as the TraffickCam mobile application, which allows every day travelers to submit images of their hotel room in order to help combat trafficking. The TraffickCam images are more visually similar to images from trafficking investigations than the images from travel websites.

7 papers0 benchmarksImages

PreviousPage 181 of 1000Next