19,997 machine learning datasets
19,997 dataset results
This dataset contains around 10000 videos generated by various methods using the Prompt list. These videos have been evaluated using the innovative EvalCrafter framework, which assesses generative models across visual, content, and motion qualities using 17 objective metrics and subjective user opinions.
The exiD dataset introduces a groundbreaking collection of naturalistic road user trajectories at highway entries and exits in Germany, meticulously captured with drones to navigate past the limitations of conventional traffic data collection methods, such as occlusions. This approach not only allows for the precise extraction of each road user’s trajectory and type but also ensures very high positional accuracy, thanks to sophisticated computer vision algorithms. Its innovative data collection technique minimizes errors and maximizes the quality and reliability of the dataset, making it a valuable resource for advanced research and development in the field of automated driving technologies.
NorQuAD is the first Norwegian question answering dataset specifically designed for machine reading comprehension. This dataset comprises 4,752 manually created question-answer pairs. The data collection procedure is detailed in the paper, along with relevant statistics. Additionally, several multilingual and Norwegian monolingual language models were benchmarked against human performance using this dataset¹²³.
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1) && (HRSWK>0)). The prediction task is to determine whether a person makes over $50K a year.
LOGO is a multi-person long-form video dataset with frame-wise annotations on both action procedures and formations based on artistic swimming scenarios. It provides a potential for constructing an action quality assessment approach with the ability to model group information among actors. Longer video durations also challenge the ability of the method to aggregate long-term temporal information.
Integrals and Differential Equations Dataset using the generators from the paper
Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded.
Drone Surveillance of Faces, is a large-scale drone dataset intended to facilitate research for face recognition using drones.
Unified Time Series Dataset (UTSD) includes 7 domains with up to 1 billion time points with hierarchical capacities to facilitate research of large models in the field of time series. It is meticulously assembled from a blend of publicly accessible online data repositories and empirical data derived from real-world machine operations. We analyze each dataset within the collection, examining the time series through the lenses of stationarity and forecastability to allows us to characterize the level of complexity inherent to each dataset.
IndustReal is an ego-centric, multi-modal dataset where 27 participants are challenged to perform assembly and maintenance procedures on a construction-toy car. The dataset is annotated for action recognition, assembly state detection, and procedure step recognition. IndustReal includes 38 execution errors in a total of 84 videos, with 14 exclusive to validation and test sets and therefore suitable for testing the robustness of algorithms against unseen errors in procedural tasks. IndustReal offers open-source 3D models for all parts to promote the use of synthetic data for scalable approaches on this dataset, as well as reproducibility. All assembly parts used in the dataset are 3D printed. This ensures reproducibility and future availability of the model and allows for growth via community effort.
PhyBench is a comprehensive Text-to-Image (T2I) evaluation dataset designed to assess the physical commonsense of T2I models¹. It was introduced by the OpenGVLab and includes 700 prompts across four primary categories: mechanics, optics, thermodynamics, and material properties, covering 31 distinct physical scenarios¹.
DreamBench++ is a human-aligned benchmark for personalized image generation¹. It's automated by advanced multimodal GPT models¹. The goal of DreamBench++ is to assist humans in everyday work and life by creatively generating personalized content¹².
The WikiNews Arabic Diacritization dataset is a test set composed of 70 WikiNews articles (majority are from 2013 and 2014) that cover a variety of themes, namely: politics, economics, health, science and technology, sports, arts, and culture. The articles are evenly distributed among the different themes (10 per theme). The articles contain 18,300 words with around 400 different sentences (Each line is considered as a sentence).
The task is to predict the chances of a user listening to a song repetitively after the first observable listening event within a time window was triggered. If there are recurring listening event(s) triggered within a month after the user's very first observable listening event, its target is marked 1, and 0 otherwise in the training set. KKBox provides a training data set consists of information of the first observable listening event for each unique user-song pair within a specific time duration. Metadata of each unique user and song pair is also provided. The train and the test data are selected from users listening history in a given time period, and are split based on time. Note that only the labeled train set of the dataset is used for benchmarking.
An underwater dataset collected from several field trials within the EU FP7 project “Cognitive autonomous diving buddy (CADDY)”, where an Autonomous Underwater Vehicle (AUV) was used to interact with divers and monitor their activities. Purpose: Studying and boosting object classification, segmentation and human pose estimation tasks, where divers use the CADDIAN gesture-based language. Data were recorded in different environmental conditions that cause various image distortions unique to underwater scenarios, i.e., low contrast, color distortion, and haze. Dataset characteristics (gesture-related): 9191 annotated stereo pairs were gathered for 16 classes (gesture types), i.e., 18,382 total samples. 7190 true negative stereo pairs (14,380 samples) that contain background scenery and divers without gesturing.
Since H36M is captured in a controlled environment, it rarely depicts challenging real-world scenarios such as body occlusions that are the main source of ambiguity in the single-view 3D shape estimation problem. Hence, we construct an adapted version of H36M with synthetically-generated occlusions by randomly hiding a subset of the 2D keypoints and re-computing an image crop around the remaining visible joints.
SemOpenAlex is an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. * SemOpenAlex is licensed under CC0, providing free and open access to the data. * We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud, complete with resolvable URIs and links to other data sources (ISNI, DOI, ORCID, ROR, Scopus, DOAJ, Wikidata, * Moreover, we provide embeddings for knowledge graph entities using high-performance computing.
We contribute a new challenging visual grounding dataset for robotic perception and reasoning in indoor environments, called RoboRefIt. The RoboRefIt collects 10,872 real-world RGB and depth images from cluttered daily life scenes, and generates 50,758 referring expressions in the form of robot language instructions. Moreover, nearly half of the images involve ambiguous object recognition. We hope that the RoboRefIt provides a distinctive training bed of visual grounding tasks for the robot interactive grasp.
For RAWFC, we constructed it from scratch by collecting the claims from Snopes and relevant raw reports by retrieving claim keywords. To alleviate the dependency of fact-checked reports, RAWFC was constructed by using raw reports (from scratch), where gold labels refer to Snopes. Each instance in the train/val/test set is presented as a signle file.
The MidiCaps dataset [1] is a large-scale dataset of 168,385 midi music files with descriptive text captions, and a set of extracted musical features.