19,997 machine learning datasets
19,997 dataset results
Introduction This dataset supports Ye et al. 2024 Nature Communications (https://www.nature.com/articles/s41467-024-48792-2).
Description: The "iRodent" dataset contains rodent species observations obtained using the iNaturalist API, with a focus on Suborder Myomorpha (Taxon ID: 16). The dataset features prominent rodent species like Muskrat, Brown Rat, House Mouse, Black Rat, Hispid Cotton Rat, Meadow Vole, Bank Vole, Deer Mouse, White-footed Mouse, and Striped Field Mouse. The dataset provides manually labeled keypoints for pose estimation and segmentation masks for a subset of images using a Mask R-CNN model.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Large-scale collection of machine learning datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain scientists and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite for accelerating research in machine learning and computational sciences.
UW Indoor Scenes (UW-IS) Occluded dataset is curated using commodity hardware (Intel RealSense D435) to reflect real world robotics scenarios. It consists of two completely different indoor environments. The first environment is a lounge where the objects are placed on a tabletop. The second environment is a mock warehouse setup where the objects are placed on a shelf. For each of these environments, we have RGB-D images from 36 videos comprising five to seven objects each, taken from distances up to approximately 2m. The videos cover two different lighting conditions, three different levels of object separation for three different object categories (i.e., kitchen objects, food items, and tools/miscellaneous). The first level of object separation is such that there is no object occlusion. The second level of object separation is such that some occlusion occurs, while the third level is where the objects are placed extremely close together. Overall, the dataset considers 20 object class
Text-Vison Cross-Modal Place Recognition Dataset
We collect a dataset of Rich Human Feedback on 18K images (RichHF-18K), which contains (i) point annotations on the image that highlight regions of implausibility/artifacts, and text-image misalignment; (ii) labeled words on the prompts specifying the missing or misrepresented concepts in the generated image; and (iii) four types of fine-grained scores for image plausibility, text-image alignment, aesthetics, and overall rating.
This is the dataset accompanying the paper: "PreBit - A multimodal model with Twitter FinBERT embeddings for extreme price movement prediction of Bitcoin"
https://github.com/rail-berkeley/soar?tab=readme-ov-file#using-soar-data
Here is SUMS dataset collected by Qinghai University. It is a multi-camera and multi-signal Biosensing dataset collected at high altitudes, which includes 80 synchronized non-contact facial and contact finger videos from 10 subjects during exercise and oxygen recovery scenarios. This dataset captures PPG, respiration rate (RR), and SpO2, and is designed to validate video vitals estimation algorithms and compare facial rPPG with finger cPPG. Our results demonstrate that fusing videos from different positions (face and finger) reduces the mean absolute error (MAE) of SpO2 predictions by 7.6% and 10.6% compared to using only face or only finger data. Additionally, training on multiple indicators such as PPG and blood oxygen simultaneously reduces SpO2 estimation MAE by 17.8%. We recruited ten participants living on the Qinghai Plateau to collect hypoxia data in a real high-altitude environment. Data collection utilized two Logitech C922 cameras to capture videos of participants’ faces and
The Remote Learning Affect and Physiologic (RLAP) dataset is a dataset applied to remote learning affect and engagement, which contains learners' blood volume pulse (BVP) signals that are highly synchronized. This dataset is suitable for training neural rPPG algorithms.
VerilogEval Dataset The VerilogEval Dataset is a benchmark specifically designed to assess the ability of large language models (LLMs) to generate syntactically correct and functionally accurate Verilog code. Introduced in the paper VerilogEval: Evaluating Large Language Models for Verilog Code Generation, it has become a cornerstone for research in hardware code generation.
https://huggingface.co/papers/2502.20730
Problem Statement
The dataset contains multi-omics data, incuding mRNA, miRNA, and DNA methylation. The dataset comprises 8,464 samples involving 2,794 omics features and covers 31 cancer types and normal tissues.
This is the dataset which contains the ' limitation' text from all papers of ACL 2023
LymphoMNIST is a comprehensive dataset designed for the nuanced classification of lymphocyte images. It encompasses approximately 80,000 high-resolution 64x64 images, meticulously categorized into three primary classes: B cells, T4 cells, and T8 cells.
This is the latest version of our datasets, and is built upon GTA-V for expressive human pose and shape estimation. It features multi-person scenes with SMPL-X annotations. In addition to color image sequences, 3D bounding boxes and cropped point clouds (generated from synthetic depth images) are also provided. Please contact Zhongang Cai (caiz0023@e.ntu.edu.sg) for feedback.
Despite impressive advancements in video understanding, most efforts remain limited to coarse-grained or visual-only video tasks. However, real-world videos encompass omni-modal information (vision, audio, and speech) with a series of events forming a cohesive storyline. The lack of multi-modal video data with fine-grained event annotations and the high cost of manual labeling are major obstacles to comprehensive omni-modality video perception. To address this gap, we propose an automatic pipeline consisting of high-quality multi-modal video filtering, semantically coherent omni-modal event boundary detection, and cross-modal correlation-aware event captioning. In this way, we present LongVALE, the first-ever Vision-Audio-Language Event understanding benchmark comprising 105K omni-modal events with precise temporal boundaries and detailed relation-aware captions within 8.4K high-quality long videos. Further, we build a baseline that leverages LongVALE to enable video large language mod
DropletVideo is a project exploring high-order spatio-temporal consistency in image-to-video generation. It is trained on DropletVideo-10M. The model supports multi-resolution inputs, dynamic FPS control for motion intensity, and demonstrates potential for 3D consistency. The model supports multi-resolution inputs, dynamic FPS control for motion intensity, and demonstrates potential for 3D consistency. For further details, you can check our project page as well as the technical report.