19,997 machine learning datasets
19,997 dataset results
Inception Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 580 CIFAR-10 networks with an accuracy range of [89.08%, 94.03%].
Two-Path Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 6.9k CIFAR-10 networks with an accuracy range of [85.53%, 92.34%].
CREPE is QA dataset containing a natural distribution of presupposition failures from online information-seeking forums. It consists of 8400 Reddit questions with (1) whether there is any false presuppositions annotated, and (2) if any, the presupposition and its correction written.
Geoclidean-Elements dataset is derived from definitions in the first book of Euclid’s Elements, which focuses on plane geometry. Geoclidean-Elements includes 17 target concepts and 34 tasks.
Naturalistic Variation Object Dataset (NVD) is a large simulated dataset of 272k images of everyday objects with naturalistic variations such as object pose, scale, viewpoint, lighting and occlusions.
General-purpose Visual Understanding Evaluation (G-VUE) is a comprehensive benchmark covering the full spectrum of visual cognitive abilities with four functional domains -- Perceive, Ground, Reason, and Act. The four domains are embodied in 11 carefully curated tasks, from 3D reconstruction to visual reasoning and manipulation.
OIR is a financial-domain dataset of the outbound intent recognition task. It aims to identify the intent of customer response in the outbound call scenario.
ExHVV is a novel dataset that offers natural language explanations of connotative roles for three types of entities -- heroes, villains, and victims, encompassing 4,680 entities present in 3K memes.
Perseus is a dataset for Cross-Lingual Summarization (CLS) which collects about 94K Chinese scientific documents paired with English summaries. The average length of documents in Perseus is more than two thousand tokens.
DialogUSR dataset covers 23 domains with a multi-step crowd-sourcing procedure. It comprises 36.7 Chinese characters by assembling 3.6 single-intent queries (including initial and follow-up queries) and is designed for dialogue utterance splitting and reformulation task.
The OCR-IDL dataset comprises the OCR annotations for a subset of 26M pages of the large-scale IDL document library. These annotations have a monetary value over $20,000 and are made publicly available with the aim of advancing the Document Intelligence research field. Our motivation is two-fold: First, by making these annotations public, we aim to level the differences between research groups and companies who have big private datasets to pre/train on. And second, we make use of a commercial OCR engine to obtain high quality annotations, leading to reduce the noise provided by OCR on pretraining and downstream tasks.
RGBD1K is a benchmark for RGB-D Object Tracking which contains 1050 sequences with about 2.5M frames in total.
This data set contains 775 video sequences, captured in the wildlife park Lindenthal (Cologne, Germany) as part of the AMMOD project, using an Intel RealSense D435 stereo camera. In addition to color and infrared images, the D435 is able to infer the distance (or “depth”) to objects in the scene using stereo vision. Observed animals include various birds (at daytime) and mammals such as deer, goats, sheep, donkeys, and foxes (primarily at nighttime). A subset of 412 images is annotated with a total of 1038 individual animal annotations, including instance masks, bounding boxes, class labels, and corresponding track IDs to identify the same individual over the entire video.
3D FRONT HUMAN is a dataset that extends the large-scale synthetic scene dataset 3D-FRONT. Specifically, the 3D scenes with humans, i.e., non-contact humans (a sequence of walking motion and standing humans) as well as contact humans (sitting, touching, and lying humans). 3D FRONT HUMAN contains four room types: 1) 5689 bedrooms, 2) 2987 living rooms, 3) 2549 dining rooms and 4) 679 libraries. We use 21 object categories for the bedrooms, 24 for the living and dining rooms, and 25 for the libraries.
Machine learning is transforming the video editing industry. Recent advances in computer vision have leveled-up video editing tasks such as intelligent reframing, rotoscoping, color grading, or applying digital makeups. However, most of the solutions have focused on video manipulation and VFX. This work introduces the Anatomy of Video Editing, a dataset, and benchmark, to foster research in AI-assisted video editing. Our benchmark suite focuses on video editing tasks, beyond visual effects, such as automatic footage organization and assisted video assembling. To enable research on these fronts, we annotate more than 1.5M tags, with relevant concepts to cinematography, from 196176 shots sampled from movie scenes. We establish competitive baseline methods and detailed analyses for each of the tasks. We hope our work sparks innovative research towards underexplored areas of AI-assisted video editing.
MAPS-KB is a million-scale probabilistic simile knowledge base, covering 4.3 million triplets over 0.4 million terms from 70 GB corpora. It is designed for the tasks of simile detection and component extraction.
Usually, the information related to the crop types available in a given territory is annual information, that is, we only know the type of main crop grown over a year and we do not know any crops that have followed one another during the year and also we do not know when a particular crop is sown and when it is harvested. The main objective of this dataset is to create the basis for experimenting with suitable solutions to give a reliable answer to the above questions, or to propose models capable of producing dynamic segmentation maps that show when a crop begins to grow and when it is collected. Consequently, being able to understand if more than one crop has been grown in a territory within a year. In this dataset, we have 20 coverage classes as ground-truth values provided by Regine Lombardia. The mapping of the class labels used (see file lombardia-classes/classes25pc.txt) brings together some classes and provides the time intervals within which that category grows. The last two c
BG Vulnerable Pedestrian (BGVP) is a dataset to help train well-rounded models and thus induce research to increase the efficacy of vulnerable pedestrian detection. The dataset contains 2,000 images with 5,932 bounding box instances from four categories, i.e., Children Without Disability, Elderly without Disability, With Disability, and Non-Vulnerable.
PulseImpute is a benchmark for Pulsative Physiological Signal Imputation which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. It contains 440,953 100 Hz 5-minute ECG waveforms from 32,930 patients
CA4P-483 is a dataset designed to facilitate the sequence labeling tasks and regulation compliance identification between privacy policies and software. It contains 483 Chinese Android application privacy policies, over 11K sentences, and 52K fine-grained annotations.