Datasets

19,997 machine learning datasets

19,997 dataset results

CASIA-SURF

Dataset for face anti-spoofing in terms of both subjects and modalities. Specifically, it consists of subjects with videos and each sample has modalities (i.e., RGB, Depth and IR).

24 papers0 benchmarks

CCPD (Chinese City Parking Dataset)

The Chinese City Parking Dataset (CCPD) is a dataset for license plate detection and recognition. It contains over 250k unique car images, with license plate location annotations.

24 papers0 benchmarksImages

KELM

KELM is a large-scale synthetic corpus of Wikidata KG as natural text.

24 papers0 benchmarksTexts

KeypointNet is a large-scale and diverse 3D keypoint dataset that contains 83,231 keypoints and 8,329 3D models from 16 object categories, by leveraging numerous human annotations, based on ShapeNet models.

24 papers0 benchmarks3D

Moral Stories

Moral Stories is a crowd-sourced dataset of structured narratives that describe normative and norm-divergent actions taken by individuals to accomplish certain intentions in concrete situations, and their respective consequences.

24 papers0 benchmarksTexts

NAS-Bench-1Shot1

NAS-Bench-1Shot1 draws on the recent large-scale tabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shot NAS methods.

24 papers0 benchmarks

Natural Stories

The Natural Stories dataset consists of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected parse trees and includes self-paced reading time data.

24 papers0 benchmarksTexts

Okutama-Action

A new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors.

24 papers2 benchmarks

RADIATE (RAdar Dataset In Adverse weaThEr)

RADIATE (RAdar Dataset In Adverse weaThEr) is new automotive dataset created by Heriot-Watt University which includes Radar, Lidar, Stereo Camera and GPS/IMU. The data is collected in different weather scenarios (sunny, overcast, night, fog, rain and snow) to help the research community to develop new methods of vehicle perception. The radar images are annotated in 7 different scenarios: Sunny (Parked), Sunny/Overcast (Urban), Overcast (Motorway), Night (Motorway), Rain (Suburban), Fog (Suburban) and Snow (Suburban). The dataset contains 8 different types of objects (car, van, truck, bus, motorbike, bicycle, pedestrian and group of pedestrians).

24 papers4 benchmarksImages

SIMMC (Situated and Interactive Multimodal Conversations)

Situated Interactive MultiModal Conversations (SIMMC) is the task of taking multimodal actions grounded in a co-evolving multimodal input content in addition to the dialog history. This dataset contains two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture (grounded in a shared virtual environment) and (b) fashion (grounded in an evolving set of images).

24 papers0 benchmarksTexts

Spoken-SQuAD

In SpokenSQuAD, the document is in spoken form, the input question is in the form of text and the answer to each question is always a span in the document. The following procedures were used to generate spoken documents from the original SQuAD dataset. First, the Google text-to-speech system was used to generate the spoken version of the articles in SQuAD. Then CMU Sphinx was sued to generate the corresponding ASR transcriptions. The SQuAD training set was used to generate the training set of Spoken SQuAD, and SQuAD development set was used to generate the testing set for Spoken SQuAD. If the answer of a question did not exist in the ASR transcriptions of the associated article, the question-answer pair was removed from the dataset because these examples are too difficult for listening comprehension machine at this stage.

24 papers3 benchmarksSpeech

SQUID (Stereo Quantitative Underwater Image Dataset)

A dataset of images taken in different locations with varying water properties, showing color charts in the scenes. Moreover, to obtain ground truth, the 3D structure of the scene was calculated based on stereo imaging. This dataset enables a quantitative evaluation of restoration algorithms on natural images.

24 papers0 benchmarks

Toronto-3D

Toronto-3D is a large-scale urban outdoor point cloud dataset acquired by an MLS system in Toronto, Canada for semantic segmentation. This dataset covers approximately 1 km of road and consists of about 78.3 million points. Point clouds has 10 attributes and classified in 8 labelled object classes.

24 papers6 benchmarksPoint cloud

FineAction

FineAction contains 103K temporal instances of 106 action categories, annotated in 17K untrimmed videos. FineAction introduces new opportunities and challenges for temporal action localization, thanks to its distinct characteristics of fine action classes with rich diversity, dense annotations of multiple instances, and co-occurring actions of different classes.

24 papers21 benchmarksVideos

GeoS

GeoS is a dataset for automatic math problem solving. It is a dataset of SAT plane geometry questions where every question has a textual description in English accompanied by a diagram and multiple choices. Questions and answers are compiled from previous official SAT exams and practice exams offered by the College Board. We annotate ground-truth logical forms for all questions in the dataset.

24 papers2 benchmarksTexts

Google Landmarks

The Google Landmarks dataset contains 1,060,709 images from 12,894 landmarks, and 111,036 additional query images. The images in the dataset are captured at various locations in the world, and each image is associated with a GPS coordinate. This dataset is used to train and evaluate large-scale image retrieval models.

24 papers0 benchmarksImages

FlickrStyle10K

FlickrStyle10K is collected and built on Flickr30K image caption dataset. The original FlickrStyle10K dataset has 10,000 pairs of images and stylized captions including humorous and romantic styles. However, only 7,000 pairs from the ofﬁcial training set are now publicly accessible. The dataset can be downloaded via https://zhegan27.github.io/Papers/FlickrStyle_v0.9.zip

24 papers3 benchmarksImages, Texts

ASTE-Data-V2

A benchmark dataset for the Aspect Sentiment Triplet Extraction, an updated version of ASTE-Data-V1.

24 papers1 benchmarksTexts

AFLW-19 (The 19 landmark variant of AFLW.)

The original AFLW provides at most 21 points for each face, but excluding coordinates for invisible landmarks, causing difficulties for training most of the existing baseline approaches. To make fair comparisons, the authors manually annotate the coordinates of these invisible landmarks to enable training of those baseline approaches. The new annotation does not include two ear points because it is very difficult to decide the location of invisible ears. This causes the point number of AFLW-19 to be 19.

24 papers20 benchmarksImages

DADA-seg

DADA-seg is a pixel-wise annotated accident dataset, which contains a variety of critical scenarios from traffic accidents. It is used for semantic segmentation.

24 papers2 benchmarksImages

PreviousPage 92 of 1000Next