Datasets

19,997 machine learning datasets

19,997 dataset results

CoDEx Small

CoDEx comprises a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are plausible but verified to be false.

5 papers4 benchmarks

CraigslistBargains

A richer dataset based on real items on Craigslist.

5 papers0 benchmarks

Cube++

Cube++ is a novel dataset for the color constancy problem that continues on the Cube+ dataset. It includes 4890 images of different scenes under various conditions. For calculating the ground truth illumination, a calibration object with known surface colors was placed in every scene.

5 papers0 benchmarksImages

DemCare

Dem@Care is providing the following datasets, which are collected during lab and home experiments. The data collection took place in the Greek Alzheimer’s Association for Dementia and Related Disorders in Thessaloniki, Greece and in participants’ homes. The datasets include video and audio recordings as well as data from physiological sensors. Moreover, they include data from sleep, motion and plug sensors.

5 papers0 benchmarks

Europeana Newspapers

Europeana Newspapers consists of four datasets with 100 pages each for the languages Dutch, French, German (including Austrian) as part of the Europeana Newspapers project is expected to contribute to the further development and improvement of named entity recognition systems with a focus on historical content.

5 papers0 benchmarks

Flickr Cropping Dataset

The Flick Cropping Dataset consists of high quality cropping and pairwise ranking annotations used to evaluate the performance of automatic image cropping approaches.

5 papers0 benchmarksImages

FollowUp

1000 query triples on 120 tables.

5 papers0 benchmarks

GolfDB

GolfDB is a high-quality video dataset created for general recognition applications in the sport of golf, and specifically for the task of golf swing sequencing.

5 papers0 benchmarksVideos

HDR+ Burst Photography Dataset

The dataset consists of 3640 bursts (made up of 28461 images in total), organized into subfolders, plus the results of an image processing pipeline. Each burst consists of the raw burst input (in DNG format) and certain metadata not present in the images, as sidecar files.

5 papers0 benchmarks

Headlines dataset

The Headlines dataset for sarcasm detection is collected from two news website. TheOnion aims at producing sarcastic versions of current events. The dataset includes all the headlines from News in Brief and News in Photos categories (which are sarcastic) and real (and non-sarcastic) news headlines from HuffPost. This dataset has following advantages over the existing Twitter datasets:

5 papers0 benchmarksTexts

HHOI

A new RGB-D video dataset, i.e., UCLA Human-Human-Object Interaction (HHOI) dataset, which includes 3 types of human-human interactions, i.e., shake hands, high-five, pull up, and 2 types of human-object-human interactions, i.e., throw and catch, and hand over a cup. On average, there are 23.6 instances per interaction performed by totally 8 actors recorded from various views. Each interaction lasts 2-7 seconds presented at 10-15 fps.

5 papers0 benchmarks

HindEnCorp

A parallel corpus of Hindi and English, and HindMonoCorp, a monolingual corpus of Hindi in their release version 0.5. Both corpora were collected from web sources and preprocessed primarily for the training of statistical machine translation systems. HindEnCorp consists of 274k parallel sentences (3.9 million Hindi and 3.8 million English tokens). HindMonoCorp amounts to 787 million tokens in 44 million sentences.

5 papers0 benchmarks

HJDataset

HJDataset is a large dataset of Historical Japanese Documents with Complex Layouts. It contains over 250,000 layout element annotations of seven types. In addition to bounding boxes and masks of the content regions, it also includes the hierarchical structures and reading orders for layout elements. The dataset is constructed using a combination of human and machine efforts.

5 papers0 benchmarksImages, Texts

PreviousPage 211 of 1000Next

Datasets

CoDEx Small

CraigslistBargains

Cube++

DemCare

Europeana Newspapers

Flickr Cropping Dataset

FollowUp

GolfDB

HDR+ Burst Photography Dataset

Headlines dataset

HHOI

HindEnCorp

HJDataset

IITB Corridor

IIW (Intrinsic Images in the Wild)

IPN Hand

ISBDA

Jamendo Lyrics

Libri-Adapt

Liputan6

Datasets

CoDEx Small

CraigslistBargains

Cube++

DemCare

Europeana Newspapers

Flickr Cropping Dataset

FollowUp

GolfDB

HDR+ Burst Photography Dataset

Headlines dataset

HHOI

HindEnCorp

HJDataset

IITB Corridor

IIW (Intrinsic Images in the Wild)

IPN Hand

ISBDA

Jamendo Lyrics

Libri-Adapt

Liputan6