19,997 machine learning datasets
19,997 dataset results
PixelHelp includes 187 multi-step instructions of 4 task categories deined in https://support.google.com/pixelphone and annotated by human. This dataset includes 88 general tasks, such as configuring accounts, 38 Gmail tasks, 31 Chrome tasks, and 30 Photos related tasks. This dataset is an updated opensource version of the original PixelHelp dataset, which was used for testing the end-to-end grounding quality of the model in paper "Mapping Natural Language Instructions to Mobile UI Action Sequences". The similar accuracy is acquired on this version of the dataset.
The GenericsKB contains 3.4M+ generic sentences about the world, i.e., sentences expressing general truths such as "Dogs bark," and "Trees remove carbon dioxide from the atmosphere." Generics are potentially useful as a knowledge source for AI systems requiring general world knowledge. The GenericsKB is the first large-scale resource containing naturally occurring generic sentences (as opposed to extracted or crowdsourced triples), and is rich in high-quality, general, semantically complete statements. Generics were primarily extracted from three large text sources, namely the Waterloo Corpus, selected parts of Simple Wikipedia, and the ARC Corpus. A filtered, high-quality subset is also available in GenericsKB-Best, containing 1,020,868 sentences.
WikiReading is a large-scale natural language understanding task and publicly-available dataset with 18 million instances. The task is to predict textual values from the structured knowledge base Wikidata by reading the text of the corresponding Wikipedia articles. The task contains a rich variety of challenging classification and extraction sub-tasks, making it well-suited for end-to-end models such as deep neural networks (DNNs).
The CRVD dataset consists of 55 groups of noisy-clean videos with ISO values ranging from 1600 to 25600.
The Drive&Act dataset is a state of the art multi modal benchmark for driver behavior recognition. The dataset includes 3D skeletons in addition to frame-wise hierarchical labels of 9.6 Million frames captured by 6 different views and 3 modalities (RGB, IR and depth).
Contains 1024 pairs of high-quality images and covers diverse scenarios.
Contains more than 13K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents. The questions were written by crowd workers who did not have access to any of the linked documents, leading to questions that have little lexical overlap with the contexts where the answers appear.
MultiviewX is a synthetic Multiview pedestrian detection dataset. It is build using pedestrian models from PersonX, in Unity. The MultiviewX dataset covers a square of 16 meters by 25 meters. The ground plane is quantized into a 640x1000 grid. There are 6 cameras with overlapping field-of-view in the MultiviewX dataset, each of which outputs a 1080x1920 resolution image. On average, 4.41 cameras are covering the same location.
The SentiCap dataset contains several thousand images with captions with positive and negative sentiments. These sentimental captions are constructed by the authors by re-writing factual descriptions. In total there are 2000+ sentimental captions.
Cityscapes-VPS is a video extension of the Cityscapes validation split. It provides 2500-frame panoptic labels that temporally extend the 500 Cityscapes image-panoptic labels. There are total 3000-frame panoptic labels which correspond to 5, 10, 15, 20, 25, and 30th frames of each 500 videos, where all instance ids are associated over time. It not only supports video panoptic segmentation (VPS) task, but also provides super-set annotations for video semantic segmentation (VSS) and video instance segmentation (VIS) tasks.
HomebrewedDB is a dataset for 6D pose estimation mainly targeting training from 3D models (both textured and textureless), scalability, occlusions, and changes in light conditions and object appearance. The dataset features 33 objects (17 toy, 8 household and 8 industry-relevant objects) over 13 scenes of various difficulty. It also consists of a set of benchmarks to test various desired detector properties, particularly focusing on scalability with respect to the number of objects and resistance to changing light conditions, occlusions and clutter.
We have created three new Reading Comprehension datasets constructed using an adversarial model-in-the-loop.
PeMSD7 is traffic data in District 7 of California consisting of the traffic speed of 228 sensors while the period is from May to June in 2012 (only weekdays) with a time interval of 5 minutes. This dataset is popular for benchmark the traffic forecasting models.
Dataset is constructed from single intent dataset SNIPS.
The dataset was collected using the Intel RealSense D435i camera, which was configured to produce synchronized accelerometer and gyroscope measurements at 400 Hz, along with synchronized VGA-size (640 x 480) RGB and depth streams at 30 Hz. The depth frames are acquired using active stereo and is aligned to the RGB frame using the sensor factory calibration. All the measurements are timestamped.
VAW is a large scale visual attributes dataset with explicitly labelled positive and negative attributes.
UrbanScene3D is a large scale urban scene dataset associated with a handy simulator based on Unreal Engine 4 and AirSim, which consists of both man-made and real-world reconstruction scenes in different scales, referred to as UrbanScene3D. The manually made scene models have compact structures, which are carefully constructed/designed by professional modelers according to the images and maps of target areas. In contrast, UrbanScene3D also offers dense, detailed scene models reconstructed by aerial images through multi-view stereo (MVS) techniques. These scenes have realistic textures and meticulous structures. The release also includes the originally captured aerial images that have been used to reconstruct the 3D scene models, as well as a set of 4K video sequences that would facilitate designing algorithms, such SLAM and MVS.
Screen2Words is a large-scale screen summarization dataset annotated by human workers. The dataset contains more than 112k language summarization across 22k unique UI screens. This dataset can be used for Mobile User Interface Summarization, which is a task where a model generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen.
Harm-C is a dataset for detecting harmful memes related to Covid-19.
MultiDoc2Dial is a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single given document or passage. We aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents.