Datasets

19,997 machine learning datasets

19,997 dataset results

CDTB (Color-and-Depth Tracking)

Source: https://www.vicos.si/Projects/CDTB 4.2 State-of-the-art Comparison A TH CTB (color-and-depth visual object tracking) dataset is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The sequences were recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis. It contains around 100,000 samples. Image Source: https://www.vicos.si/Projects/CDTB

17 papers0 benchmarksImages, RGB-D

CASIA-HWDB

CASIA-HWDB is a dataset for handwritten Chinese character recognition. It contains 300 files (240 in HWDB1.1 training set and 60 in HWDB1.1 test set). Each file contains about 3000 isolated gray-scale Chinese character images written by one writer, as well as their corresponding labels.

17 papers0 benchmarksImages

D-HAZY

The D-HAZY dataset is generated from NYU depth indoor image collection. D-HAZY contains depth map for each indoor hazy image. It contains 1400+ real images and corresponding depth maps used to synthesize hazy scenes based on Koschmieder’s light propagation mode

17 papers0 benchmarksImages

DESED (Domestic environment sound event detection)

The DESED dataset is a dataset designed to recognize sound event classes in domestic environments. The dataset is designed to be used for sound event detection (SED, recognize events with their time boundaries) but it can also be used for sound event tagging (SET, indicate presence of an event in an audio file). The dataset is composed of 10 event classes to recognize in 10 second audio files. The classes are: Alarm/bell/ringing, Blender, Cat, Dog, Dishes, Electric shaver/toothbrush, Frying, Running water, Speech, Vacuum cleaner.

17 papers3 benchmarksAudio

ICVL

17 papers5 benchmarks

iSarcasm

iSarcasm is a dataset of tweets, each labelled as either sarcastic or non_sarcastic. Each sarcastic tweet is further labelled for one of the following types of ironic speech:

17 papers1 benchmarksTexts

Arabic Text Diacritization

Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words.

17 papers0 benchmarksTexts

BRWAC

Composed by 2.7 billion tokens, and has been annotated with tagging and parsing information.

17 papers0 benchmarks

CALFW (Cross-Age LFW)

A renovation of Labeled Faces in the Wild (LFW), the de facto standard testbed for unconstraint face verification.

17 papers19 benchmarks

CLEVR-Ref+

CLEVR-Ref+ is a synthetic diagnostic dataset for referring expression comprehension. The precise locations and attributes of the objects are readily available, and the referring expressions are automatically associated with functional programs. The synthetic nature allows control over dataset bias (through sampling strategy), and the modular programs enable intermediate reasoning ground truth without human annotators.

17 papers2 benchmarksImages, Texts

CPLFW (Cross-Pose LFW)

A renovation of Labeled Faces in the Wild (LFW), the de facto standard testbed for unconstraint face verification.

17 papers19 benchmarks

DAiSEE

DAiSEE is a multi-label video classification dataset comprising of 9,068 video snippets captured from 112 users for recognizing the user affective states of boredom, confusion, engagement, and frustration "in the wild". The dataset has four levels of labels namely - very low, low, high, and very high for each of the affective states, which are crowd annotated and correlated with a gold standard annotation created using a team of expert psychologists.

17 papers2 benchmarksVideos

DialoGLUE

DialoGLUE is a natural language understanding benchmark for task-oriented dialogue designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. It consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks.

17 papers0 benchmarksTexts

FMD (Fluorescence Microscopy Denoising)

The Fluorescence Microscopy Denoising (FMD) dataset is dedicated to Poisson-Gaussian denoising. The dataset consists of 12,000 real fluorescence microscopy images obtained with commercial confocal, two-photon, and wide-field microscopes and representative biological samples such as cells, zebrafish, and mouse brain tissues. Image averaging is used to effectively obtain ground truth images and 60,000 noisy images with different noise levels.

17 papers3 benchmarksImages

FreebaseQA

FreebaseQA is a data set for open-domain QA over the Freebase knowledge graph. The question-answer pairs in this data set are collected from various sources, including the TriviaQA data set and other trivia websites (QuizBalls, QuizZone, KnowQuiz), and are matched against Freebase to generate relevant subject-predicate-object triples that were further verified by human annotators. As all questions in FreebaseQA are composed independently for human contestants in various trivia-like competitions, this data set shows richer linguistic variation and complexity than existing QA data sets, making it a good test-bed for emerging KB-QA systems.

17 papers0 benchmarksTexts

JESC (Japanese-English Subtitle Corpus)

Japanese-English Subtitle Corpus is a large Japanese-English parallel corpus covering the underrepresented domain of conversational dialogue. It consists of more than 3.2 million examples, making it the largest freely available dataset of its kind. The corpus was assembled by crawling and aligning subtitles found on the web.

17 papers0 benchmarksTexts

LABR (Large-Scale Arabic Book Reviews)

LABR is a large sentiment analysis dataset to-date for the Arabic language. It consists of over 63,000 book reviews, each rated on a scale of 1 to 5 stars.

17 papers0 benchmarksTexts

LCCC (Large-scale Cleaned Chinese Conversation corpus)

Contains a base version (6.8million dialogues) and a large version (12.0 million dialogues).

17 papers0 benchmarks

MEVA (Multiview Extended Video with Activities)

Large-scale dataset for human activity recognition. Existing security datasets either focus on activity counts by aggregating public video disseminated due to its content, which typically excludes same-scene background video, or they achieve persistence by observing public areas and thus cannot control for activity content. The dataset is over 9300 hours of untrimmed, continuous video, scripted to include diverse, simultaneous activities, along with spontaneous background activity.

17 papers0 benchmarks

MLRSNet

MLRSNet is a a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. It provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.

17 papers2 benchmarksImages

PreviousPage 113 of 1000Next