19,997 machine learning datasets
19,997 dataset results
The CelebA-Dialog dataset has the following properties: 1) Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning; 2) Accompanied with each image, there are captions describing the attributes and a user request sample.
Satellite imagery analytics have numerous human development and disaster response applications, particularly when time series methods are involved. For example, quantifying population statistics is fundamental to 67 of the 232 United Nations Sustainable Development Goals, but the World Bank estimates that more than 100 countries currently lack effective Civil Registration systems. The SpaceNet 7 Multi-Temporal Urban Development Challenge aims to help address this deficit and develop novel computer vision methods for non-video time series data. In this challenge, participants will identify and track buildings in satellite imagery time series collected over rapidly urbanizing areas. The competition centers around a new open source dataset of Planet satellite imagery mosaics, which includes 24 images (one per month) covering ~100 unique geographies. The dataset will comprise over 40,000 square kilometers of imagery and exhaustive polygon labels of building footprints in the imagery, total
ConditionalQA is a Question Answering (QA) dataset that contains complex questions with conditional answers, i.e. the answers are only applicable when certain conditions apply.
P3M-10k contains 10421 high-resolution real-world face-blurred portrait images, along with their manually labeled alpha mattes. The Dataset is aimed to aid research efforts in the area of portrait image matting and related topics.
The Dataset is part of the KELM corpus
This paper introduces the Broad Twitter Corpus (BTC), which is not only significantly bigger, but sampled across different regions, temporal periods, and types of Twitter users. The gold-standard named entity annotations are made by a combination of NLP experts and crowd workers, which enables us to harness crowd recall while maintaining high quality. We also measure the entity drift observed in our dataset (i.e. how entity representation varies over time), and compare to newswire.
Update on 3DIdent, where we introduce six additional object classes (Hare, Dragon, Cow, Armadillo, Horse, and Head), and impose a causal graph over the latent variables. For further details, see Appendix B in the associated paper (https://arxiv.org/abs/2106.04619).
A dataset of color images corrupted by natural noise due to low-light conditions, together with spatially and intensity-aligned low noise images of the same scenes.
The largest real-world night-time semantic segmentation dataset with pixel-level labels.
CICERO contains 53,000 inferences for five commonsense dimensions -- cause, subsequent event, prerequisite, motivation, and emotional reaction -- collected from 5600 dialogues. It involves two challenging generative and multi-choice alternative selection tasks for the state-of-the-art NLP models to solve. Download the dataset using this link.
The MagicData-RAMC corpus contains 180 hours of conversational speech data recorded from native speakers of Mandarin Chinese over mobile phones with a sampling rate of 16 kHz. The dialogs in the dialogs are classified into 15 diversified domains and tagged with topic labels, ranging from science and technology to ordinary life. Accurate transcription and precise speaker voice activity timestamps are manually labeled for each sample. Speakers' detailed information is also provided.
ONCE-3DLanes is a real-world autonomous driving dataset with lane layout annotation in 3D space. A dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations from 2D lane annotations by exploiting the explicit relationship between point clouds and image pixels in 211,000 road scenes.
Bongard-HOI testifies to which extent your few-shot visual learner can quickly induce the true HOI concept from a handful of images and perform reasoning with it. Further, the learner is also expected to transfer the learned few-shot skills to novel HOI concepts compositionally.
MUSES is a large-scale dataset for temporal event (action) localization. It focuses on the temporal localization of multi-shot events, which are captured with multiple shots. Such events often appear in edited videos, such as TV shows and movies.
This multi-view pant-tilt-zoom-camera (PTZ) dataset features competitive alpine skiers performing giant slalom runs. It provides labels for the skiers’ 3D poses in each frame, their projected 2D pose in all 20k images, and accurate per-frame calibration of the PTZ cameras. The dataset was collected by Spörri and Colleagues within his Habilitation at the Department of Sport Science and Kinesiology of the University of Salzburg [Spörri16], and was previously used as a reference in different methodological studies [Gilgien13, Gilgien14, Gilgien 15, Fasel16, Fasel18, Rhodin18]. Moreover, upon request the dataset would be available to interested researchers for further methodological-orientated research purposes.
https://sites.google.com/view/recon-robot/dataset
This dataset contains 33,400 annotated comments used for hate speech detection on social network sites. Label: CLEAN (non hate), OFFENSIVE and HATE
Co-speech gestures are everywhere. People make gestures when they chat with others, give a public speech, talk on a phone, and even think aloud. Despite this ubiquity, there are not many datasets available. The main reason is that it is expensive to recruit actors/actresses and track precise body motions. There are a few datasets available (e.g., MSP AVATAR [17] and Personality Dyads Corpus [18]), but their sizes are limited to less than 3 h, and they lack diversity in speech content and speakers. The gestures also could be unnatural owing to inconvenient body tracking suits and acting in a lab environment.
Lila is a unified mathematical reasoning benchmark consisting of 23 diverse tasks along four dimensions: (i) mathematical abilities e.g., arithmetic, calculus (ii) language format e.g., question-answering, fill-in-the-blanks (iii) language diversity e.g., no language, simple language (iv) external knowledge e.g., commonsense, physics. The benchmark is constructed by extending 20 datasets benchmark by collecting task instructions and solutions in the form of Python programs, thereby obtaining explainable solutions in addition to the correct answer.
ECTSum is a dataset with transcripts of earnings calls (ECTs), hosted by public companies, as documents, and short experts-written telegram-style bullet point summaries derived from corresponding Reuters articles. ECTs are long unstructured documents without any prescribed length limit or format.