19,997 machine learning datasets
19,997 dataset results
StreaksYoloDataset, is a set of raw astronomical images captured with smart telescopes and annotated with the positions of streaks that are effectively in the images. Images were captured between March 2022 and February 2023 from Luxembourg Greater Region by using the built-in alignment and stacking features of a Stellina smart telescope, based on an Extra Low Dispersion doublet with an aperture of 80 mm and a focal length of 400 mm (focal ratio of f/5), and equipped with a Sony IMX178 CMOS sensor with a resolution of 6.4 million pixels.
The NeuroVoz dataset emerges as a pioneering resource in the field of computational linguistics and biomedical research, specifically designed to enhance the diagnosis and understanding of Parkinson's Disease (PD) through speech analysis. This dataset is distinguished as the first of its kind to be made publicly available in Castilian Spanish, addressing a critical gap in the availability of linguistic and dialectical diversity within PD research.
BASEPROD provides comprehensive rover sensor data collected over a 1.7 km traverse, accompanied by high-resolution 2D and 3D drone maps of the terrain. The dataset also includes laser-induced breakdown spectroscopy (LIBS) measurements from key sampling sites along the rover's path, as well as weather station data to contextualize environmental conditions.
DenseUAV is a dataset of drone and satellite perspectives collected from 14 universities in low-altitude urban scenes. The main features include real scene sampling, sampling perspective perpendicular to the ground, and dense sampling. A total of 3033 sampling points, including 9099 drone perspective images and 18198 satellite perspective images.
WildDESED is an extension of the original DESED dataset, created to reflect various domestic scenarios by incorporating complex and unpredictable background noises. These enhancements make WildDESED a powerful resource for developing and evaluating noise-robust SED systems.
A Point Cloud Dataset for place recognition provided by PointNetVLAD, please refer to the URL
https://syncandshare.lrz.de/getlink/fi9EZb33KiSAJ5rLHAkhg7/ffhnet-data.zip
Consolidates the world cup 2014 (WC14) and time-series world cup (TSWC) datasets and refines their homography annotations.
SQL-Eval is an open-source PostgreSQL evaluation dataset released by Defog, constructed based on Spider. The original link can be found at https://github.com/defog-ai/sql-eval. Our evaluation methodology is more stringent, as it compares the execution accuracy of the predicted SQL queries against the sole ground truth SQL query.
We introduce Recognition-based Object Probing Evaluation (ROPE), an automated evaluation protocol that considers the distribution of object classes within a single image during testing and uses visual referring prompts to eliminate ambiguity. Different types of instruction settings of ROPE. In a single turn of prompting without format enforcement, we probe the model to recognize the 5 objects referred to by the visual prompts (a) one at a time in the single-object setting and (b) concurrently in the multi-object setting. We further enforce the model to follow the format template and decode only the object tokens for each of the five objects (c) without output manipulation in student forcing and (d) replacing all previously generated object tokens with the ground truth classes in teacher forcing.
Our dataset consists of over 1000 fractured frescoes. The RePAIR stands as a realistic computational challenge for methods for 2D and 3D puzzle solving, and serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding. Please visit our website for more dataset information, access to source code scripts and for an interactive gallery viewing of the dataset samples.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
A large scale OCSR dataset, proposed in paper “MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild“ MolParser-7M contains nearly 8 million paired image-SMILES data. It should be noted that the caption of image is extended-SMILES format proposed in paper.
The Robot House Multi-View dataset (RHM) contains four views: Front, Back, Ceiling, and Robot Views. There are 14 classes with 6701 video clips for each view, making a total of 26804 video clips for the four views. The lengths of the video clips are between 1 to 5 seconds. The videos with the same number and the same classes are synchronized in different views.
DISC-Law-SFT comprises two subsets, DISC-Law-SFT-Pair and DISC-Law-SFT-Triplet. The former aims to introduce legal reasoning abilities to the LLM, while the latter helps enhance the model's capability to utilize external legal knowledge.
40 personalized concepts
The Wallhack1.8k dataset comprises 1,806 CSI amplitude spectrograms (and raw WiFi packet time series) corresponding to three activity classes: "no presence," "walking," and "walking + arm-waving." WiFi packets were transmitted at a frequency of 100 Hz, and each spectrogram captures a temporal context of approximately 4 seconds (400 WiFi packets).
ADORE is a benchmark dataset for machine learning for ecotixicology, covering acute aquatic toxicity in three relevant taxonomic groups (fish, crustaceans, and algae). The core dataset describes ecotoxicological experiments and is expanded with phylogenetic and species-specific data on the species as well as chemical properties and molecular representations. Apart from challenging other researchers to try and achieve the best model performances across the whole dataset, we propose specific relevant challenges on subsets of the data and include datasets and splittings corresponding to each of these challenge as well as in-depth characterization and discussion of train-test splitting approaches.
NBA: This is extended from a Kaggle dataset * containing around 400 NBA basketball players. The performance statistics of players in the 2016-2017 season and other various information e.., nationality, age, and salary are provided. To obtain the graph that links the NBA players together, we collect the relationships of the NBA basketball players on Twitter with its official crawling API 2. We binarize the nationality to two categories, i.e., U.S. players and oversea players, which is used as sensitive attribute. The classification task is to predict whether the salary of the player is over median.