Datasets

19,997 machine learning datasets

19,997 dataset results

iVQA (Instructional Video Question Answering)

An open-ended VideoQA benchmark that aims to: i) provide a well-defined evaluation by including five correct answer annotations per question and ii) avoid questions which can be answered without the video.

22 papers2 benchmarksTexts, Videos

KdConv (Knowledge-driven Conversation)

KdConv is a Chinese multi-domain Knowledge-driven Conversation dataset, grounding the topics in multi-turn conversations to knowledge graphs. KdConv contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn number of 19.0. These conversations contain in-depth discussions on related topics and natural transition between multiple topics, while the corpus can also used for exploration of transfer learning and domain adaptation.

22 papers0 benchmarksTexts

BookTest

BookTest is a new dataset similar to the popular Children’s Book Test (CBT), however more than 60 times larger.

22 papers0 benchmarksTexts

AESLC

To study the task of email subject line generation: automatically generating an email subject line from the email body.

22 papers6 benchmarks

AVA-ActiveSpeaker

Contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible. This dataset contains about 3.65 million human labeled frames or about 38.5 hours of face tracks, and the corresponding audio.

22 papers1 benchmarks

BAR (Biased Action Recognition)

Biased Action Recognition (BAR) dataset is a real-world image dataset categorized as six action classes which are biased to distinct places. The authors settle these six action classes by inspecting imSitu, which provides still action images from Google Image Search with action and place labels. In detail, the authors choose action classes where images for each of these candidate actions share common place characteristics. At the same time, the place characteristics of action class candidates should be distinct in order to classify the action only from place attributes. The select pairs are six typical action-place pairs: (Climbing, RockWall), (Diving, Underwater), (Fishing, WaterSurface), (Racing, APavedTrack), (Throwing, PlayingField),and (Vaulting, Sky).

22 papers2 benchmarksImages

CARRADA

CARRADA is a dataset of synchronized camera and radar recordings with range-angle-Doppler annotations.

22 papers0 benchmarks

COWC (Cars Overhead With Context)

The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars.

22 papers0 benchmarksImages

CSS10

A collection of single speaker speech datasets for ten languages. It is composed of short audio clips from LibriVox audiobooks and their aligned texts.

22 papers0 benchmarks

EV-IMO

Includes accurate pixel-wise motion masks, egomotion and ground truth depth.

22 papers0 benchmarks

EXAMS

A new benchmark dataset for cross-lingual and multilingual question answering for high school examinations. Collects more than 24,000 high-quality high school exam questions in 16 languages, covering 8 language families and 24 school subjects from Natural Sciences and Social Sciences, among others. EXAMS offers a fine-grained evaluation framework across multiple languages and subjects, which allows precise analysis and comparison of various models.

22 papers0 benchmarks

Fusion 360 Gallery

The Fusion 360 Gallery Dataset contains rich 2D and 3D geometry data derived from parametric CAD models. The dataset is produced from designs submitted by users of the CAD package Autodesk Fusion 360 to the Autodesk Online Gallery. The dataset provides valuable data for learning how people design, including sequential CAD design data, designs segmented by modelling operation, and design hierarchy and connectivity data.

22 papers8 benchmarks3D

Gibson Environment

Gibson is an opensource perceptual and physics simulator to explore active and real-world perception. The Gibson Environment is used for Real-World Perception Learning.

22 papers0 benchmarksEnvironment

HoME (Household Multimodal Environment)

HoME (Household Multimodal Environment) is a multimodal environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more.

22 papers0 benchmarks

IMDb-Face

IMDb-Face is large-scale noise-controlled dataset for face recognition research. The dataset contains about 1.7 million faces, 59k identities, which is manually cleaned from 2.0 million raw images. All images are obtained from the IMDb website.

22 papers0 benchmarksImages

IP102

IP102 contains more than 75,000 images belonging to 102 categories, which exhibit a natural long-tailed distribution.

22 papers0 benchmarks

JHU-CROWD

(JHU-CROWD) a crowd counting dataset that contains 4,250 images with 1.11 million annotations. This dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations in addition to many distractor images, making it a very challenging dataset. Additionally, the dataset consists of rich annotations at both image-level and head-level.

22 papers0 benchmarksImages

MOROCO (MOldavian and ROmanian Dialectal COrpus)

The MOldavian and ROmanian Dialectal COrpus (MOROCO) is a corpus that contains 33,564 samples of text (with over 10 million tokens) collected from the news domain. The samples belong to one of the following six topics: culture, finance, politics, science, sports and tech. The data set is divided into 21,719 samples for training, 5,921 samples for validation and another 5,924 samples for testing.

22 papers0 benchmarksTexts

MosMedData

MosMedData contains anonymised human lung computed tomography (CT) scans with COVID-19 related findings, as well as without such findings. A small subset of studies has been annotated with binary pixel masks depicting regions of interests (ground-glass opacifications and consolidations). CT scans were obtained between 1st of March, 2020 and 25th of April, 2020, and provided by municipal hospitals in Moscow, Russia.

22 papers1 benchmarksMedical

PIT (Paraphrase and Semantic Similarity in Twitter)

Paraphrase and Semantic Similarity in Twitter (PIT) presents a constructed Twitter Paraphrase Corpus that contains 18,762 sentence pairs.

22 papers2 benchmarksTexts

PreviousPage 97 of 1000Next