Datasets

19,997 machine learning datasets

19,997 dataset results

CelebA-Dialog

The CelebA-Dialog dataset has the following properties: 1) Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning; 2) Accompanied with each image, there are captions describing the attributes and a user request sample.

12 papers0 benchmarks

SpaceNet 7 (Multi-Temporal Urban Development SpaceNet Dataset)

Satellite imagery analytics have numerous human development and disaster response applications, particularly when time series methods are involved. For example, quantifying population statistics is fundamental to 67 of the 232 United Nations Sustainable Development Goals, but the World Bank estimates that more than 100 countries currently lack effective Civil Registration systems. The SpaceNet 7 Multi-Temporal Urban Development Challenge aims to help address this deficit and develop novel computer vision methods for non-video time series data. In this challenge, participants will identify and track buildings in satellite imagery time series collected over rapidly urbanizing areas. The competition centers around a new open source dataset of Planet satellite imagery mosaics, which includes 24 images (one per month) covering ~100 unique geographies. The dataset will comprise over 40,000 square kilometers of imagery and exhaustive polygon labels of building footprints in the imagery, total

12 papers0 benchmarksImages

ConditionalQA

ConditionalQA is a Question Answering (QA) dataset that contains complex questions with conditional answers, i.e. the answers are only applicable when certain conditions apply.

12 papers4 benchmarksTexts

P3M-10k (Privacy-Preserving Portrait Matting Dataset)

P3M-10k contains 10421 high-resolution real-world face-blurred portrait images, along with their manually labeled alpha mattes. The Dataset is aimed to aid research efforts in the area of portrait image matting and related topics.

12 papers3 benchmarks

TekGen

The Dataset is part of the KELM corpus

12 papers2 benchmarksGraphs, Texts

Broad Twitter Corpus

This paper introduces the Broad Twitter Corpus (BTC), which is not only significantly bigger, but sampled across different regions, temporal periods, and types of Twitter users. The gold-standard named entity annotations are made by a combination of NLP experts and crowd workers, which enables us to harness crowd recall while maintaining high quality. We also measure the entity drift observed in our dataset (i.e. how entity representation varies over time), and compare to newswire.

12 papers1 benchmarksTexts

Causal3DIdent

Update on 3DIdent, where we introduce six additional object classes (Hare, Dragon, Cow, Armadillo, Horse, and Head), and impose a causal graph over the latent variables. For further details, see Appendix B in the associated paper (https://arxiv.org/abs/2106.04619).

12 papers1 benchmarksImages

RENOIR

A dataset of color images corrupted by natural noise due to low-light conditions, together with spatially and intensity-aligned low noise images of the same scenes.

12 papers2 benchmarksImages

NightCity

The largest real-world night-time semantic segmentation dataset with pixel-level labels.

12 papers0 benchmarks

CICERO (Contextualized Commonsense Inference in Dialogues)

CICERO contains 53,000 inferences for five commonsense dimensions -- cause, subsequent event, prerequisite, motivation, and emotional reaction -- collected from 5600 dialogues. It involves two challenging generative and multi-choice alternative selection tasks for the state-of-the-art NLP models to solve. Download the dataset using this link.

12 papers4 benchmarksTexts

MagicData-RAMC

The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in the dialogs are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker voice activity timestamps are manually labeled for each sample. Speakers' detailed information is also provided.

12 papers0 benchmarksSpeech

ONCE-3DLanes (Monocular 3D Lane Detection Dataset)

ONCE-3DLanes is a real-world autonomous driving dataset with lane layout annotation in 3D space. A dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations from 2D lane annotations by exploiting the explicit relationship between point clouds and image pixels in 211,000 road scenes.

12 papers0 benchmarksImages

Bongard-HOI

Bongard-HOI testifies to which extent your few-shot visual learner can quickly induce the true HOI concept from a handful of images and perform reasoning with it. Further, the learner is also expected to transfer the learned few-shot skills to novel HOI concepts compositionally.

12 papers2 benchmarksImages, Texts

MUSES (MUlti-Shot EventS)

MUSES is a large-scale dataset for temporal event (action) localization. It focuses on the temporal localization of multi-shot events, which are captured with multiple shots. Such events often appear in edited videos, such as TV shows and movies.

12 papers24 benchmarks

Ski-Pose PTZ-Camera

This multi-view pant-tilt-zoom-camera (PTZ) dataset features competitive alpine skiers performing giant slalom runs. It provides labels for the skiers’ 3D poses in each frame, their projected 2D pose in all 20k images, and accurate per-frame calibration of the PTZ cameras. The dataset was collected by Spörri and Colleagues within his Habilitation at the Department of Sport Science and Kinesiology of the University of Salzburg [Spörri16], and was previously used as a reference in different methodological studies [Gilgien13, Gilgien14, Gilgien 15, Fasel16, Fasel18, Rhodin18]. Moreover, upon request the dataset would be available to interested researchers for further methodological-orientated research purposes.

12 papers0 benchmarks

RECON (RECON Outdoor Navigation Dataset)

https://sites.google.com/view/recon-robot/dataset

12 papers0 benchmarksImages, RGB Video, Stereo

ViHSD (Vietnamese Hate Speech Detection Dataset)

This dataset contains 33,400 annotated comments used for hate speech detection on social network sites. Label: CLEAN (non hate), OFFENSIVE and HATE

12 papers0 benchmarks

TED Gesture Dataset

Co-speech gestures are everywhere. People make gestures when they chat with others, give a public speech, talk on a phone, and even think aloud. Despite this ubiquity, there are not many datasets available. The main reason is that it is expensive to recruit actors/actresses and track precise body motions. There are a few datasets available (e.g., MSP AVATAR [17] and Personality Dyads Corpus [18]), but their sizes are limited to less than 3 h, and they lack diversity in speech content and speakers. The gestures also could be unnatural owing to inconvenient body tracking suits and acting in a lab environment.

12 papers2 benchmarksAudio, Texts, Videos

Lila

Lila is a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., question-answering, fill-in-the-blanks (iii) language diversity e.g., no language, simple language (iv) external knowledge e.g., commonsense, physics. The benchmark is constructed by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs, thereby obtaining explainable solutions in addition to the correct answer.

12 papers0 benchmarksTexts

ECTSum

ECTSum is a dataset with transcripts of earnings calls (ECTs), hosted by public companies, as documents, and short experts-written telegram-style bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format.

12 papers0 benchmarksTexts

PreviousPage 140 of 1000Next