3,275 machine learning datasets
3,275 dataset results
A large scale OCSR dataset, proposed in paper “MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild“ MolParser-7M contains nearly 8 million paired image-SMILES data. It should be noted that the caption of image is extended-SMILES format proposed in paper.
The Robot House Multi-View dataset (RHM) contains four views: Front, Back, Ceiling, and Robot Views. There are 14 classes with 6701 video clips for each view, making a total of 26804 video clips for the four views. The lengths of the video clips are between 1 to 5 seconds. The videos with the same number and the same classes are synchronized in different views.
40 personalized concepts
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
GEOBench-VLM, a comprehensive benchmark specifically designed to evaluate VLMs on geospatial tasks, including scene understanding, object counting, localization, fine-grained categorization, and temporal analysis. Our benchmark features over 10,000 manually verified instructions and covers a diverse set of variations in visual conditions, object type, and scale.
Dataset created in the paper "Learning to Count Objects in Images" by Victor Lempitsky and Andrew Zisserman exists as a benchmark to have a dataset useful for cell enumeration. The dataset comes with 200 images of 256x256 resolution of artificial blue fluorescent cells, and the groundtruth consists of the spatial coordinates of each cells centroid. Useful for regression counting or density map estimation.
Underwater Trash Detection Dataset Overview The Underwater Trash Detection Dataset is a custom-annotated dataset designed to address the challenges of underwater trash detection caused by varying environmental features. Publicly available datasets alone are insufficient for training deep learning models due to domain-specific variations in underwater conditions. This dataset offers a cumulative, self-annotated collection of underwater images for detecting and classifying trash, providing a strong foundation for deep learning research and benchmark testing.
V2VBench is a comprehensive benchmark designed to evaluate video editing methods. It consists of: - 50 standardized videos across 5 categories, and - 3 editing prompts per video, encompassing 4 editing tasks: Huggingface Datasets - 8 evaluation metrics to assess the quality of edited videos: Evaluation Metrics
Event cameras are sensors that are inspired by biological systems and specialize in capturing changes in brightness. These emerging cameras offer numerous advantages over conventional frame-based cameras, including high dynamic range, high frame rates, and extremely low power consumption. As a result, event cameras are increasingly being used in various fields, such as object detection and tracking, autonomous driving, 3D reconstruction, visual odometry, and SLAM.
Replication Data for: Integrating Earth Observation Data into Causal Inference: Challenges and Opportunities
U-DIADS-Bib is a proprietary dataset developed through the collaboration of computer scientists and humanities at the University of Udine. It is composed of 200 images, 50 for each of the 4 different manuscripts that characterize it. These handwritten books were selected in collaboration with humanist partners considering both the complexity of their layout and the presence of significant and semantically distinguishable elements. In particular, the images of the four manuscripts were collected from the digital library Gallica. All manuscripts are Latin and Syriac Bibles published between the 6th and 12th centuries A.D.
Introduction This dataset supports Ye et al. 2024 Nature Communications (https://www.nature.com/articles/s41467-024-48792-2).
Description: The "iRodent" dataset contains rodent species observations obtained using the iNaturalist API, with a focus on Suborder Myomorpha (Taxon ID: 16). The dataset features prominent rodent species like Muskrat, Brown Rat, House Mouse, Black Rat, Hispid Cotton Rat, Meadow Vole, Bank Vole, Deer Mouse, White-footed Mouse, and Striped Field Mouse. The dataset provides manually labeled keypoints for pose estimation and segmentation masks for a subset of images using a Mask R-CNN model.
UW Indoor Scenes (UW-IS) Occluded dataset is curated using commodity hardware (Intel RealSense D435) to reflect real world robotics scenarios. It consists of two completely different indoor environments. The first environment is a lounge where the objects are placed on a tabletop. The second environment is a mock warehouse setup where the objects are placed on a shelf. For each of these environments, we have RGB-D images from 36 videos comprising five to seven objects each, taken from distances up to approximately 2m. The videos cover two different lighting conditions, three different levels of object separation for three different object categories (i.e., kitchen objects, food items, and tools/miscellaneous). The first level of object separation is such that there is no object occlusion. The second level of object separation is such that some occlusion occurs, while the third level is where the objects are placed extremely close together. Overall, the dataset considers 20 object class
Text-Vison Cross-Modal Place Recognition Dataset
We collect a dataset of Rich Human Feedback on 18K images (RichHF-18K), which contains (i) point annotations on the image that highlight regions of implausibility/artifacts, and text-image misalignment; (ii) labeled words on the prompts specifying the missing or misrepresented concepts in the generated image; and (iii) four types of fine-grained scores for image plausibility, text-image alignment, aesthetics, and overall rating.
LymphoMNIST is a comprehensive dataset designed for the nuanced classification of lymphocyte images. It encompasses approximately 80,000 high-resolution 64x64 images, meticulously categorized into three primary classes: B cells, T4 cells, and T8 cells.
ThermoHands is the first benchmark dataset specifically designed for egocentric 3D hand pose estimation from thermal images. It addresses the challenges of hand pose estimation in low-light conditions and when the hand is occluded by gloves or other wearables—scenarios where traditional RGB or NIR-based systems struggle.
The dataset available for download on this webpage represents a 5x5x5µm section taken from the CA1 hippocampus region of the brain, corresponding to a 1065x2048x1536 volume. The resolution of each voxel is approximately 5x5x5nm. The data is provided as multipage TIF files that can be loaded in Fiji. We annotated mitochondria in two sub-volumes. Each sub-volume consists of the first 165 slices of the 1065x2048x1536 image stack. The volume used for training our algorithm in the publications mentionned at the bottom of this page is the top part while the bottom part was used for testing.
Understanding and analyzing animal behavior is increasingly essential to protect endangered animal species. However, the application of advanced computer vision techniques in this regard is minimal, which boils down to lacking large and diverse datasets for training deep models.